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Preface 



The papers in this volume were presented at the Fourth Italian Conference on 
Algorithms and Complexity (CIAC 2000). The conference took place on March 
1-3, 2000, in Rome (Italy), at the conference center of the University of Rome 
“La Sapienza” . 

This conference was born in 1990 as a national meeting to be held every 
three years for Italian researchers in algorithms, data structures, complexity, 
and parallel and distributed computing. Due to a significant participation of 
foreign reaserchers, starting from the second conference, CIAC evolved into an 
international conference. 

In response to the call for papers for CIAC 2000, there were 41 submis- 
sions, from which the program committee selected 21 papers for presentation at 
the conference. Each paper was evaluated by at least three program committee 
members. In addition to the selected papers, the organizing committee invited 
Giorgio Ausiello, Narsingh Deo, Walter Ruzzo, and Shmuel Zaks to give plenary 
lectures at the conference. 

We wish to express our appreciation to all the authors of the submitted 
papers, to the program committee members and the referees, to the organizing 
committee, and to the plenary lecturers who accepted our invitation. 
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On Salesmen, Repairmen, Spiders, and Other 
Traveling Agents 



Giorgio Ausiello, Stefano Leonardi, and Alberto Marchetti-Spaccamela 

Dipartimento di Informatica Sistemistica, Universita di Roma “La Sapienza”, 
via Salaria 113, 00198- Roma, Italia. 



Abstract. The Traveling Salesman Problem (TSP) is a classical prob- 
lem in discrete optimization. Its paradigmatic character makes it one of 
the most studied in computer science and operations research and one 
for which an impressive amount of algorithms (in particular heuristics 
and approximation algorithms) have been proposed. While in the general 
case the problem is known not to allow any constant ratio approximation 
algorithm and in the metric case no better algorithm than Christofides’ 
algorithm is known, which guarantees an approximation ratio of 3/2, re- 
cently an important breakthrough by Arora has led to the definition of a 
new polynomial approximation scheme for the Euclidean case. A grow- 
ing attention has also recently been posed on the approximation of other 
paradigmatic routing problems such as the Travelling Repairman Prob- 
lem (TRP). The altruistic Travelling Repairman seeks to minimimize the 
average time incurred by the customers to be served rather than to mini- 
mize its working time like the egoistic Travelling Salesman does. The new 
approximation scheme for the Travelling Salesman is also at the basis of 
a new approximation scheme for the Travelling Repairman problem in 
the euclidean space. New interesting constant approximation algorithms 
have recently been presented also for the Travelling Repairman on gen- 
eral metric spaces. Interesting applications of this line of research can be 
found in the problem of routing agents over the web. In fact the prob- 
lem of programming a “spider” for efficiently searching and reporting 
information is a clear example of potential applications of algorithms for 
the above mentioned problems. These problems are very close in spirit 
to the problem of searching an object in a known graph introduced by 
Koutsoupias, Papadimitriou and Yannakakis [14]. In this paper, moti- 
vated by web searching applications, we summarize the most important 
recent results concerning the approximate solution of the TRP and the 
TSP and their application and extension to web searching problems. 



1 Introduction 

In computer applications involving the use of mobile virtual agents (sometimes 
called ” spiders” ) that are supposed to perform some task in a computer network, 
a fundamental problem is to design routing strategies that allow the agents to 
complete their tasks in the most efficient way ([3], [4]). In this context a typical 
scenario is the following: an agent generated at node 0 of the network searches 
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a portion of the web formed by n sites, denoted by looking for an 

information. At each site i it is associated a probability pi that the required 
information is at that site. The distance from site i to site j is given by a 
metric function d{i,j). The aim is to find a path 7t(1), . . . ,7r(n) that minimizes 
the quantity “ 1>^)- In [14] this problem is called the Graph 

Searching problem (GSP in the following). The GSP is shown to be strictly 
related to the Travelling Repairman Problem (TRP), also called the Minimum 
Latency Problem (MLP), in which a repairman is supposed to visit the nodes of 
a graph in a way to minimize the overall waiting time of the customers sitting 
in the nodes of the graph. More precisely in the TRP we wish to minimize the 
quantity XlILi El=i d{k - I, k). 

The Minimum Latency problem is known to be MAX-SNP-hard for general 
metric spaces as a result of a reduction from the TSP where all the distances are 
either 1 or 2, while it is solvable in polynomial time for the case of line networks 
[ 11 - 

In this paper we present the state of the art of the approximability of the TRP 
and present the extension of such results to the Graph Searching problem. The 
relationship between GSP and TRP is helpful from two points of view. In some 
cases, in fact, an approximation preserving reduction from GSP to TRP can be 
established [14] under the assumption that the probabilities associated with the 
vertices are polynomially related, by replacing every vertex with a polynomial 
number of vertices of equal probability. This allows to apply the approximation 
algorithms developed for TRP to GSP. Among them, particularly interesting are 
the constant approximation algorithms for general metric spaces given by Blum 
et al. [9] and Goemans and Kleinberg [13] , later improved in combination with 
a result of Garg [11] on the /c-MST problem . 

More recently a quasi-polynomial ) approximation scheme for tree 

networks and Euclidean spaces has been proposed by Arora and Karakostas [5]. 
This uses the same technique as in the quasi-polynomial approximation scheme 
of Arora for the TSP [6] . The case of tree networks seems particularly interesting 
since one is often willing to run the algorithm on a tree covering a portion of the 
network that hopefully contains the required information. In the paper we also 
show how to extend approximation schemes for the TRP to the Graph Searching 
problem. 

In conclusion, the Graph Searching problem, beside being an interesting prob- 
lem “per se”, motivated by the need to design efficient strategies for moving 
“spiders” in the web, has several interesting connections with two of the most 
intriguing and paradigmatic combinatorial graph problems, the Traveling Re- 
pairman problem and the Traveling Salesman problem. Therefore the study of 
the former problem naturally leads to the study of the results obtained for the lat- 
ter problems, which may be classified among the most interesting breakthrough 
achieved in the recent history of algorithmics. 

This paper is organized as follows: in section 2 we formally define the GSP 
and in section 3 we review approximation algorithms for the TRP. In sections 4 
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and 5 we present approximation algorithms and approximation schemes for the 
GSR 

2 Preliminaries and Notation 

The Graph Searching Problem (GSP), introduced by Koutsoupias, Papadim- 
itriou and Yannakakis [14], is defined on a set of n vertices V = n} 

of a metric space M, plus a distinguished vertex of M at which the travelling 
agent is initially located and will return after the end of the tour. The starting 
point is also denoted as the root and indicated as vertex 0. A metric distance 
d{i,j) defines the distance between any pair of vertices i,j. With every vertex 
i is also associated a probability or weight Wi > 0 (the vertices with weight 0 
are simply ignored). We assume that the object is in exactly one site for which 
Sr=i Wi = 1 . A solution to the GSP is a permutation 7 t ( 1 ), . . . ,7r(n) indicating 
the tour to be followed. The distance of vertex i to the root along the tour is 
given by l{i) = The objective of the GSP is to minimize 

the expected time spent to locate the object in the network, namely Wil{i). 

We will measure the performance of algorithms by their approximation ratio, 
that is the maximum ratio over all input instances between the cost of the 
algorithm’s solution and the optimal solution. 

3 Approximation Algorithms for the TRP 

In this section we will present the approximation algorithms developed in the 
literature for the TRP that was first introduced in [1]. These results are relevant 
to the solution of the GSP that reduces to the TRP when all the vertices have 
equal probability or are polynomially related. 

In the case of line networks the problem is polynomial. 

Theorem 1. [1] There exists a 0{'n?) optimal algorithm for the TRP on line 
networks. 

The algorithm numbers the root with 0, the vertices at the right of the root 
with positive integer numbers, the vertices at the left of the root with negative 
integer numbers. By dynamic programming the algorithm stores for every pair 
of vertices {—I, r) with l,r > 0, (i) the optimal path that visits the vertices 
of [—1, r] and ends at —I and (ii) the optimal path that visits the vertices of 
[—l,r] and ends at r. The information at point (i.) is computed in 0(1) time 
by selecting the best alternative among (a.) the path that follows the optimal 
path for {—{I — 1), r), ends at —{I — 1) and then moves to I, and (b.) the optimal 
path for {—I, r — 1) that ends at r — 1 and then moves first to r then to 1. The 
information at point (ii.) is analogously computed in 0(1) time. 

The TRP is known to be solvable in polynomial time beyond line networks 
only for trees with bounded number of leaves [14] . Whether the TRP is polynomi- 
ally time solvable or NP-hard for general tree networks is still a very interesting 
open problem. 
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The first constant factor approximation for the TRP on general metric spaces 
and on tree networks has been presented by Blum et al [9]. The authors introduce 
the idea to concatenate a sequence of tours to form the final solution. The 
algorithm proposed by the authors computes for every j = 1,2,3.... a tree Tj of 
cost at most spanning the maximum number of vertices. This procedure 
is repeated until all the vertices have been included in a tree. The final tour is 
obtained by concatenating a set of tours obtained by depth first traversing trees 

Tj, j = 1,2,3, Let rtij be the number of vertices spanned by Tj. Let Sj be 

the set of vertices of Tj. Consider a number i such that nij < i < ruj+i. We 
can state that the i-th vertex visited in the optimal tour has latency at least 
2T On the other end, the latency of the f-th vertex visited in the algorithm’s 
tour is at most 8 X 2T This is because the latency of the z-th vertex in the tour 
of the algorithm is at most 2{^f.^j 2^+^ + 2-’+^) < 8 X 2T Assume that it is 
available a c approximation algorithm that is able to find a tree of minimum cost 
that spans k vertices of the network, this can be easily turned through a binary 
search procedure into a c-approximation algorithm for the problem of finding a 
tree of bounded cost that maximizes the number of vertices that are spanned. 
This immediately results in an 8 c approximation algorithm for the TRP. 

In a tree network the problem of finding a tree of k vertices of minimum cost 
is polynomial time solvable using a dynamic programming algorithm described 
in the paper of Blum et al [9]. The algorithm first transforms the tree into a 
binary tree by replacing every vertex v of degree higher than 2 into a binary tree 
with edges of cost 0 and every leaf connected to at most 2 children of v with 
edges weighted by the cost of the edges from v to the corresponding children. 
The procedure computes for any vertex of the graph, for every integer j between 
1 and k, for every i = 0, . . . ,j, the minimum cost tree that collects i vertices 
on the left subtree and j — i vertices on the right subtree. This procedure can 
be clearly implemented in polynomial time. This implies an 8- approximation 
algorithm for the TRP on tree networks. 

When the paper [9] appeared, no constant approximation algorithm for the 
/c-MST problem on general metric spaces was known. A constant approximation 
algorithm for the TRP problem on general metric spaces was then obtained by 
applying the so called (a,f3) TSP approximator. An {a,f3) TSP approximator 
is an algorithm that given bounds e and L, an n-point metric space M and a 
starting point p, finds a tour starting at p of length at most j3L which visits at 
least (1 — o;e)n vertices when there exists a tour of length L which visits (1 — t)n 
vertices. The existence of an (a,l3) TSP approximator ensures the existence 
of an 8a/3 approximation algorithm for the TRP. A (3,6) and a (4,4) TSP 
approximator were proposed in [9], a (2,4) and a (2,3) TSP approximator were 
later proposed by Goemans and Kleinberg in the paper [13]. 

The paper of Goemans and Kleinberg also presents a new technique to select 
a sequence of tours of growing length to concatenate to form a solution. The 
procedure proposed by Goemans and Kleinberg computes for every number j 
from 1 to n the tour Tj of minimum length that visits j vertices. Let dj be the 
length of tour Tj. The goal is to select values ji,. . . ,jm = n in order to minimize 
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the latency of the final tour. Let pi be the number of new vertices visited during 
tour i. Since the number of vertices discovered up to the fth tour is certainly no 
smaller than ji the following claim of [13] holds 

m m 

'^Pidi < - ji-i)di. 

i=l i=l 

It follows that for a number of vertices equal to X]I=i Pk ^ ji we sum a con- 
tribution at most dk on the left side of the equation while a contribution larger 
than dk on the right side of the equation. Moreover, each tour Ti is traversed 
in the direction that minimizes the total latency of the vertices discovered dur- 
ing tour Ti. This allows to rewrite the total latency of the tour obtained from 
concatenating Tj^,. . . , Tjm as: 

1 I 

^(n - ^ Pk)dj, + 2 

i k=l i 

i i 

i 

The formula above allows to rewrite the total latency of the algorithm only 
in terms of the indices ji and of the length dj., independently from the number 
of new vertices discovered during each tour. A complete graph of n vertices is 
then constructed in the following way. Arc (i, j) is turned into a directed edge 
from min{i,j) to max{i,j). Arc {i,j) has length (n — The algorithm 

computes a shortest path from node 0 to node n. Assume that the path goes 
through vertices 0 = jo < ji <,...,< jm = n- The tour is then obtained by 
concatenating Tj^,. . . , Tj^ . 

The obtained solution is compared against the following lower bound OPT > 
This lower bound follows from the observation that the /cth vertex 
cannot be visited in any optimal tour before dk/2. The approximation ratio of 
the algorithm is determined by bounding the maximum over all the possible set 
of distances di,. .. ,dn of the ratio between the shortest path in Gn and the lower 
bound on the optimal solution. This value results to be smaller than 3.5912 thus 
improving over the ratio of 8 in [9]. 

Theorem 2. [13] Given a c approximation algorithm for the problem of finding 
a tour of minimum length spanning k vertices on a specific metric space, then 
there exists an 3.5912c approximation ratio for the TRP on the same metric 
space. 

The method described above allows to obtain a 3.5912 approximation for 
tree networks. For general metric spaces, a 3 approximation algorithm for the k- 
MST problem and for the problem of devising a tour of minimum cost spanning 
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k vertices has been later proposed by Garg [11]. This allows to obtain a 10.7796 
approximation algorithm for the TRP on general metric spaces. This bound can 
be further ly improved by applying the more recent 2.5 approximated /c-MST 
algorithm of Arya and Kumar [7]. We will describe the algorithm of [11] for the 
/c-MST in the following section where we study the extension of the algorithm 
for the TRP to the GSR 

4 Approximation Algorithms for the GSP 

In this section we will study the extension of the algorithms for the TRP to the 
GSP. 

The algorithm for the TRP on line networks can be extended to provide a 
polynomial time algorithm for the GSP on line networks. The dynamic program- 
ming algorithm presented in the previous section is simply modified in order to 
increase the cost of a solution by the latency of a vertex weighted by its proba- 
bility rather than just by the latency of a vertex. 

As we mentioned in the introduction, the GSP problem has been introduced 
by Koutsoupias, Papadimitriou and Yannakakis [14]. In that paper the authors 
show a simple reduction from the GSP to the TRP under some restrictive con- 
ditions. They show that the metric GSP can be reduced to the metric TRP 
under the assumption that all the weights/probabilities are rational numbers 
with small coefficients and common denominators. This assumption allows to 
split every vertex into a polynomial number of vertices with weight equal to the 
common denominator of all the weights of the vertices of the graph. If two ver- 
tices in the instance of the TRP derive from the splitting of the same vertex in 
the instance of the GSP, their internode distance is 0, if the two vertices derive 
from two different vertices in the instance of the GSP, say i and j, their distance 
is d{i,j). A solution to the instance of the TRP obtained from an instance of 
the GSP can be easily turned into a solution of equal cost to the original GSP 
instance, since all the vertices at distance 0 in the TRP can be visited at the 
same time. 

Unfortunately, this reduction does not apply to the general case. In this 
section we will consider algorithms for the general case of the metric GSP and of 
the GSP on tree networks. When trying to extend the general approach for the 
TRP to the GSP, we need to solve two kind of problems: (i.) Find a sequence of 
tours to be concatenated to obtain the final tour; (ii.) Compute every tour to be 
concatenated. We will see that the solution of Goemans and Kleinberg for point 
(i.) seems not to be easily extendible to the GSP, and that the computation of 
every tour to be concatenated can be strictly more difficult than for the TRP. 

Kleinberg and Goemans propose to compute for every k = 1, . . . , n a tour 
of minimum cost that spans k vertices. The application of this approach to the 
GSP requires to find a tour of minimum cost that spans at least a given weight 
for every possible amount of weight. This approach when extended to the GSP, 
requires to compute the minimum cost tour that covers an amount of weight i for 
every i = 1, . . . , IF = Wj. This clearly results in a pseudo-polynomial time 
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algorithm. Alternatively we can think of partitioning the interval [0,iy] into a 
polynomial number of intervals of larger size x, [ix, {i + l)x], i = 0,..., \W/x — l\. 
The drawback of a similar solution is a weaker lower bound. Let w = minjtCj 
be the minimum weight of a vertex. We can state a lower bound of OPT > 
minjtc^ over the optimal solution, but we cannot state that the optimal 
solution will cover the (j + l)-th amount of weight x before time dj/2. However 
in this section we will follow the approach of Blum et al [9], that repeatedly 
finds a tree of exponentially increasing length spanning the biggest amount of 
weight until the whole weight has been collected. Their result on the relationship 
between TRP and /c-MST can be easily extended to the GSP problem; we define 
the IT-MST problem as the problem of finding a tree of minimum cost that 
covers a weigh of at least W . 

Theorem 3. [9] Given a c- approximation algorithm for W-MST problem there 
exists a 8c approximation algorithm for the GSP. 

Let Wi be the total weight collected in the Lth tour of length at most 2K 
Let ai = Wi — Wi-i- It is possible to see that any algorithm will pay for the 
weight that is collected between Wi-i and Wi a latency of at least 2*. We then 
obtain an algorithm with approximation ratio 8c if we have a c- approximation 
algorithm for finding a tree spanning a maximum amount of weight with cost 
bounded by a given value L, or alternatively an algorithm for finding a tree of 
minimum cost that covers at least a given weight, say W. 

Such algorithms are not known in literature for both tree networks and gen- 
eral metric spaces. The problem of finding a tree of minimum cost spanning a 
weight of at least W is already NP-hard for tree networks. The reduction is from 
Knapsack. Consider n items where the generic item i has cost Ci and benefit 
Wi- The corresponding instance of the GSP is obtained by constructing a star 
network of n leaves where the root is the center of the star, and every leaf i has 
weight Wi and it is connected to the center with an edge of cost Ci. The problem 
of finding a tree of maximum weight of bounded cost is clearly A"P-hard as it 
is A"P-hard the problem of finding a minimum cost tree that spans a weight of 
at least W. In the next section we will show how to provide a fully polynomial 
time approximation scheme for this problem on tree networks. 

In the rest of this section we will show how to extend a constant approxima- 
tion algorithm for the /c-MST problem to the IT-MST problem. 

Theorem 4. There exists a constant approximation algorithm for the W-MST 
problem. 

For the sake of the exposition, we limit ourself to show the extension of the 
5 approximation algorithm of Garg for the /c-MST problem to the case in which 
the goal is to collect a weight of at least W. In [11], a 3 approximation algorithm 
is also presented, that improves over the previous constant approximation algo- 
rithm by Blum, Ravi and Vempala [10], while an improved approximation based 
on the same techniques has been later proposed by Arya and Kumar [7]. These 
algorithms are based on the Primal-Dual method developed by Agrawal, Klein 
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and Ravi [2] and by Goenians and Williamson [12] to design forests of minimum 
cost satisfying various constraints. 

In the following we will highlight the main variation to the algorithm and 
to the analysis of [11] to extend the 5 approximation to the IR-MST problem. 
We consider the problem in the case the vertex furthest to the root is part of 
the optimal solution. An algorithm for the general problem is then obtained by 
trying every possible vertex and selecting the best solution. It is well known 
that the primal-dual method uses the dual of a relaxation of the linear program 
formulation of the problem as a guide for the algorithm and the analysis. We 
need to introduce the standard notation for the primal-dual method. We denote 
by S the generic subset of V and by 6{S) the set of edges with exactly one 
endpoint inside S. Let E be the edge set of the graph, e the generic edge (i,j) 
of E, and Ce = d{i,j) its cost. In the linear programming formulation of the 
problem a variable Xe € {0, 1}, Ve G E, indicates if edge e is part of the tree, 
a variable G {0, 1}, v E V, indicates if vertex v is spanned by the tree. The 
starting point of the tour is the root r of the tree. 

The linear programming formulation of the IT-MST problem after the relax- 
ation of the integrality constraints on variables Xe and Xy is as follows: 



minimize E CeXe 
eeE 

Xe > Xv (Vu, S : V e S C {V 

e€<5(S) 

XyWv = W 

vev 

x^ <1 (Vu G V) 

Xv > 0 (Vu G V) 

Xe > 0 (Ve G E) 



r}) 



In the dual formulation a variable is associated to every constraint of 
the first set, a variable p to the second constraint and a variable to every 
constraint of the third set. The dual formulation is as follows: 



maximize p W — 

vev 

yy yv,s +Pv> PWv (Vu G V) 

S:v€S 

X] < Ce (Ve G E) 

S:e€S{S) 

Pv > 0 (Vu G V) 

yv,s > 0 (Vu, S : V £ s c \y 



r}) 
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Define in a way similar to [11] the potential of vertex v &&a^ = '^s-ves Vv,S- 

Observe that if > pwy then = 0, else Pv = pwy — a^. From the previous 
observation it is possible to prove that in an optimal solution p has value be- 
tween the k\v-th and {kw + l)-th smallest ratio ^ where kw is the smallest 

integer such that The optimal solution of the dual problem can 

be thought of as an assignment to the dual variables such that the sum of the 
first kw potentials is maximized. By the duality theorem it also follows that the 
sum of the first kw potentials is a lower bound to the optimal solution of the 
primal problem. 

The primal-dual algorithm will construct a solution with cost bounded by 
twice the sum of the first kw potentials. This solution will be completed to cover 
a weight at least W with an extra cost bounded by a constant factor times the 
optimal value. 

The problem is then reduced to finding a feasible assignment of potentials 
7t(v) such that the sum of the first kw potentials is maximized. An assignment 
of potentials is feasible if there exists an assignment of variables for which 
Es,x,es:ee-5(S) < Ce and for any vertex v, tt{v) < Y.s-.v€S 

The primal-dual algorithm is run with an initial potential pWy assigned to ev- 
ery vertex v apart from the root to which it is assigned a potential 0. Every subset 
of vertices not containing the root has associated a variable ys- The assignment 
of variables ys satisfies at any time, for every vertex e, '^s-ees(S) Vs — ^e- If for 
a vertex e the inequality holds with equality then the edge is said to be tight. At 
any step of the algorithm the set of vertices V is partitioned into a set of active 
and inactive components. A component is active if it has a positive potential 
and it does not contain the root, otherwise it is inactive. 

The algorithm simultaneously increases for every active component S the 
variable ys and decreases its potential until either the potential is 0 or one of the 
constraints on one of the edges is tight. If the constraint for edge e = (f, j) is tight, 
the active components containing edges i and j are merged with potential equal 
to the sum of the residual potentials of the two components. The two components 
are made inactive, while the new component is active unless it contains the root. 
The set of tight edges at any stage forms a forest whose trees define the set 
of components at that stage. The procedure halts when all the components are 
inactive, that is the residual potential of all the components not containing the 
root is 0. 

The tree spanning the component of the root when the algorithm halts is 
denoted by Tp. Tp is then pruned to remove every edge that connects to Tp a 
subtree that spans an inactive component at some stage of the algorithm. Let Tp 
be the tree obtained after the pruning phase. The set of initial potentials does 
not necessarily form a feasible assignment of potentials. We can follow [11] in 
showing that a sufficient condition for the set of potentials to be feasible is that 
the componenent containing the root has zero potential when the algorithm 
halts. An assignment of the potentials that satisfies this requirement can be 
obtained by decreasing the potential of a vertex such that all the components 
containing that vertex have non-zero potential. We can reduce the potential 
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of every such vertex until a component containing the vertex has 0 residual 
potential. Such procedure is repeated until the component containing the root 
has 0 potential. 

Denote by cost{T) the cost of tree T, and by Wp the weight spanned by 
the vertices of Tp. The primal-dual method ensures that the cost of is at 
most twice the sum of the potentials of the vertices of T^, namely cost{T^) = 
Yhe^T' Ce < 2 Tv{v). The sum of the smallest kw potentials is also a lower 

bound on the optimal tree spanning a weight of at least W. The vertices in 
are the only vertices in the graph with ratio tt{v)/Wv < p, then the sum of the 
potentials of the vertices of the tree plus p{W — Wp) is also a lower bound 
on the optimal solution. 

We select the highest value of p such that ~ 

run the algorithm for a value p + e, thus obtaining a tree for which it 

holds W < Wp^e- A first solution is obtained as follows. Consider the tour 
obtained by traversing twice We select the minimum cost path on this 

tour that collects a weight of at least W — Wp. Such path has cost bounded by 
2 cost{Tp_^_^) . We consider a first solution obtained from tree T^, the 
path selected out of and an edge that joins this path to the root. 

Denote by OPT the cost of the optimal tour. Remind that OPT is lower 
bounded by the maximum distance from a vertex to the root. The cost of the 
first solution is bounded by 

cost{T^) + ^^~_^^ 2cost(r;yJ + OPT. 

The second solution is obtained from Following the analysis of [11], we 

write the two following lower bounds on the optimal solution: 

OPT> -Wp), 

OPT> T^P+e-{P + e){Wp+,-Wp). 

We can write: 



cost{T!p) < 2 OPT -2p{W - Wp), 

cost(T;+J < 2 OPT + 2{p + t){Wp+, - Wp), 

from which it follows that the smallest among the two solutions is at most 
5 OPT. 

By combining the above analysis with Theorem 3 we obtain the following 
Corollary. 



Corollary 1. There exists a 40 approximation algorithm for the GSP defined 
in a general metric spaee. 
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5 Approximation Schemes for the GSP 



In this section we show under what conditions the GSP allows approximation 
schemes. Also in this case we do so by extending to GSP similar results obtained 
for TRP. In particular we first present the main ideas of [5] that shows how to 
construct an approximation scheme for the TRP in the case of tree metric and 
Euclidean metric whose running time is quasi polynomial; we will then show an 
approximation preserving reduction that allows us to obtain similar results for 
the GSP. As it is done in previous papers on the TRP, the algorithm of [5] finds 
a low latency tour by joining paths; in this case the algorithm decides at the 
beginning how many nodes are in each path and then uses dynamic programming 
for computing this set of paths. 

In order to reduce the cost of dynamic programming the authors first show 
that distances between nodes can be rounded without affecting the approxima- 
tion; namely, given an instance of the TRP such that the minimum internode 
distance is 1 and the maximum internode distance is dmax it is possible to round 
internode distances in such a way that the minimum internode distance is 1 
and the maximum distance is cn^/e, where c is a constant. Given a tour T the 
rounding affects the contribution of each node to the latency of T by a value 
less than dmax^ln] since dmax is a lower bound on the optimum it follows that 
the rounding affects the value of T by a factor of e. The second idea is to break 
the optimal tour in k segments, k = 0(logn/e), % each one with a determined 
number of nodes; the number of nodes in segment i is given by 

m = |"(1 f = l,2,...,fc- 1 



Uk = [1/el- 



Let Ti be the length of 7j; clearly is a lower bound on the latency 

of any node in segment j. It follows that a lower bound on the optimum latency 
L* is given by: 

k j—1 k k 



L* > 






J = 1 Z=1 2 = 1 j'>i 



Now replace segment %, i < k, with a minimum traveling salesman tour 
through the same set of nodes. In this way both the lenght of the segment and 
the latency of nodes in subsequent segments cannot increase; the latency of nodes 
in Ti can increase by at most UiTi. Repeating this replacement for all segments 
but the last one the increase of the latency is at most 



fe-i 
2 = 1 



Observing that Uj > riije it follows that the new latency is at most (1 + 
e)L* . Note that if the above approach is applied using an a approximate solution 
for the TSP then the latency of the obtained approximate solution has value at 
most {1 + ae + a)L* . 
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Let us now consider the case of tree metric. In such case a TSP tour that 
visits a given set of nodes can be computed in polynomial time. However the 
above approach requires to know the set of nodes belonging to each segment 
in the optimal solution. These sets can be computed in quasi polynomial time 
in the case of tree metric by dynamic programming. In fact the break up in k 
segments implies that an edge is visited in the optimal solution at most k times. 

Let us consider the case of a binary tree. First identify an edge that is a 1/3 
: 2/3 separator and the algorithm “guesses” the number of times this edge is 
visited, and for each such portion the length of the portion and the number of 
nodes visited. “Guessing” means that using dynamic programming the algorithm 
exhaustively searches for all possibilities; since there at most k = O {log n/s) 
portions and the length of each portion is bounded by 0{n^/s) it follows that 
there is a polynomial number of solutions. By recurring on each side of the 
separator edge it is possible to compute an e break up in segments in . 

The above idea can be applied also to nonbinary trees. 

Theorem 5. [5] For any e, e > 0, there exists a 1 + e approximation algorithm 
for the TRP in tree metric that runs in time . 

In the Euclidean case a similar result can be obtained by making use of 
Arora’s approximation scheme [6] for the computation of the TSP paths which 
correspnd to the segments of the TRP. 

Theorem 6. [5] For any e, e > 0, there exists a 1 + e approximation algorithm 
for the TRP in Euclidean metric that runs in time '> . 

The proof of Theorem 6 will be provided the next subsection along with the 
proof of existence of a polynomial time approximation scheme for TSP. 

Let us now see how we can apply the preceding results in order to design 
approximation schemes for GSP. Recall that, given an instance x of the GSP 
with n nodes, if the integer weights associated to the nodes are polynomially 
related, then it is easy to see that GSP is polynomially reducible to an instance 
y of TRP. On the other side it can be proved [15] that if the weights are not poly- 
nomially bounded still there exists a polynomial time reduction that preserves 
the approximation schemes [8]. 

Given an instance of GSP, with n nodes, let Wmax be the maximum weight 
associated to a city and let e be any positive real number. The idea of the proof 
is to round the weights associated to each city by a factor k, k = Wmaxd/c where 
6 = jn'^ and c is a suitably chosen constant. Namely, given an instance x of 
GSP, we define a new instance x' , with the same set of nodes and the same metric 
distance of x that is obtained by rounding the weight associated to each city; 
namely, tCi, the weight associated to city i, becomes [wi/k\. Note that by the 
above rounding the weights associated to the nodes of x' are now polynomially 
related and, therefore, x' is polynomially reducible to an instance of TRP. 

Assume now that we are given a tour T that is an optimal solution of x'; 
we now show that T is a (1 H- e) approximate solution of x. In fact, following 
[5] we can assume that the maximum distance between nodes is cn^ je, where c 
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is a constant; it follows that the rounding introduces an absolute error for the 
contribution of city i to the objective function that is bounded by 

kcr? /e = Wmax5v? /s = WmaxSin. 

By summing over all nodes we obtain that the total absolute error is bounded 
by WmaxS] since Wmax is a lower bound on the optimum value of instance x it 
follows that T is a (1 + e) approximate solution of x. 

Assume now that we are given a a approximate solution of x'; analogously 
we can show that this solution is a o;(l 4-e) approximate solution of x. The above 
reduction together with the approximation results of Theorems 5 and 6 imply 
the following theorem. 

Theorem 7. There exists a quasi polynomial time (1 + e) approximation algo- 
rithm for the GSP in the ease of tree metrie and Euclidean metric. 



5.1 Polynomial Time Approximation Schemes for the Euclidean 
TSP and TRP 

Let us now insert the last missing stone which is needed to prove the existence 
of an approximate scheme for the GSP in the Euclidean case: the polynomial 
time approximation schemes for the Euclidean TSP [6] and TRP [5], Let us first 
see the result for the TSP. 

The basic idea on which Arora’s result is organized is the following. In or- 
der to overcome the computational complexity of the TSP in the Euclidean 
case we may reduce the combinatorial explosion of the solution space by impos- 
ing that the required approximate solutions should satisfy particular structural 
properties. Under suitable conditions the number of solutions which satisfy such 
properties may be reduced in such a way that we may search for the best ap- 
proximate solution by means of a dynamic programming procedure which runs 
in polynomial time. Let us consider an Euclidean TSP instance x consisting of 
a set of n points in the plane and let L be the size of its bounding box B. Let us 
first make the following simplifying assumptions: (i) all nodes have integral co- 
ordinates; (2) the minimum internode distance is 8; (3) the maximum internode 
distance is 0(n); (4) the size of the bounding box, L is 0{n) and it is a power 
of 2. It is not difficult to prove that if a PTAS exists for this particular type 
of instances, that we call well rounded instances, then it exists for general TSP 
instances. Now, suppose we are given a well rounded TSP instance. In order to 
characterize the approximate solutions that satisfy specific structural properties 
we may proceed in the following way. We decompose the bounding box through 
a recursive binary partitioning until we have at most one point per square cell 
(in practice, and more conveniently, we can organize the instance into a quad- 
tree). Note that at stage i of the partitioning process we divide any square in the 
quad-tree of size L/2*^^ which contains more than one point into 4 squares of 
size L/2L Then we identify m = 0(c log L/e) points evenly distributed on each 
side of any square created during the partition (plus four points in the square’s 
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corners). By slightly bending the edges of a TSP tour we will impose it to cross 
square boundaries only at those prespecified points (called “portals”). Finally 
we allow the partition to be shifted both horizontally and vertically by integer 
quantities a and b respectively. The structure theorem can then be stated in the 
following terms. 

Theorem 8. (Structure Theorem, [6]) Let a well rounded instance x be given, 
let L be the size of its bounding box B and let s > 0 be a constant. Let us 
pick a and b, with 0 < a,b < L, randomly and let us consider the recursive 
partitioning of B shifted by quantities a and b. Then with probability at least 1/2 
there is a salesman tour of cost at most (1 + e)OPT (where OPT is the cost of 
an optimum solution) that crosses eaeh edge of each square in the partition at 
most r = 0{l/e) times always going through one among m = 0(log L/s) portals 
(such tour is called (m, r) light). 

Essentially the proof of the theorem is based on the fact that given a recursive 
partitioning of B shifted by quantities a and b, given an optimum TSP tour of 
length OPT it is possible to bend its edges slightly so that they cross square 
boundaries only 0(1 /e) times and only at portals and the resulting increase in 
length of the path is at most e OPT/2. Over all possible shifts a, b, therefore, 
with probability > 1/2, such increase is bounded by e OPT. 

On the basis of the structure theorem we can then define the following polyno- 
mial time approximation scheme for the solution of a well rounded TSP instance. 
In what follows we call a TSP path a tour, or a fragment of tour, which goes 
through the points in the TSP instance and whose edges are possibly bended 
through the portals. When we will have built a unique TSP path that goes ex- 
actly once through all points in the instance it will be immediately possible to 
transform it into a TSP tour by taking Euclidean shortcuts whenever possible. 

Given the required approximation ratio (1 -I- e) and given a TSP instance x 
the randomized algorithm performs the following steps: 

1) Perturbation. Instance x is first transformed into a well rounded instance 
x' . We will then look for a (1 -I- e') approximate solution for instance x' where 
e' can be easily computed from e. 

2) Construction of the shifted quad-tree. Given a random choice of 1 < a,b < 
L a quad-tree with such shifts is computed. The depth of the quad-tree will be 
O(logn) and the number of squares it will contain is T = 0(n log n) 

3) Construction of the path by dynamic programming. The path that satisfies 
the structure theorem can be constructed bottom-up by dynamic programming 
as follows. In any square there may be p paths, each connecting a pair of portals 
such that for any i a path goes from the first portal to the second portal in pair 
Pi . Since 2p < 4r and there are 4m + 4 portals on the borders of one square, 
there are at most (4m + 4)^” ways to chose the crossing points of the p paths 
and there are at most 4r! pairings among such crossing points. Eor each of the 
choices we compute the optimum solution which corresponds to one entry in 
the lookup table that the dynamic programming procedure has to construct. 
Since we have T squares, the total number of entries is 0{T{Am + 4)*^^’"^(4r)!). 
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In order to determine the running time of the procedure let us see how the 
entries are constructed; first note that for the leaves of the quad-tree we only 
have the condition that one path should go through the node (if any) in the leaf. 
Inductively, when we want to construct the entries for a square S at level i — 1 
of the quad-tree, given the entries of squares S'!, S' 2 , 5's, 5*4 at level i, we have to 
determine the optimum way to connect the paths in the subsquares by choosing 
among all possible choices on how to cross the inner borders among the four 
squares. Such choices are (4m + 4)^’"(4r)^’"(4r)!. All taken into account we have 
a running time 0(T(4m + 4)®’"(4r)^’"(4r!)^), that is 0{n(logn)^^^^^\ 

Easily enough the algorithm can be derandomized by exhaustively trying all 
possible shifts a, b, and picking the best path. This simply implies repeating 
steps 2) and 3) of the algorithm O(n^) times. In conclusion we can state the 
final result. 

Theorem 9. [5] For any e, e > 0, there exists a 1 + e approximation algorithm 
for the TSP in Euclidean metric that runs in time 0(n^(logn)^/^). 

The result for the TRP goes along the same line. As we have seen, in or- 
der to solve the TRP we compute 0{logn/s) segments consisting in as many 
salesman paths. Now the same algorithm as before can be constructed but this 
time we want to compute simultaneously the 0{\ogn/s) salesman paths. As a 
consequence, while in the case of the TSP we were looking for paths going at 
most r = 0(l/e) times through one among m = 0{logn/s) portals, in the case 
of the TRP we construct paths that go 0(logn/e) times through the m portals. 
The same dynamic programming technique as in the case of TSP can then be 
applied, but now, since we may have Oifogn/ e) crossings, the algorithm will 
require quasi-polynomial time '>). Theorem 6 hence follows. 
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Abstract. A minimum spanning tree (MST) with a small diameter is required in 
numerous practical situations. It is needed, for example, in distributed mutual 
exclusion algorithms in order to minimize the number of messages 
communicated among processors per critical section. The Diameter- 
Constrained MST (DCMST) problem can be stated as follows: given an 
undirected, edge-weighted graph G with n nodes and a positive integer k, find a 
spanning tree with the smallest weight among all spanning trees of G which 
contain no path with more than k edges. This problem is known to be NP- 
complete, for all values oi k; A <k<{n - 2). Therefore, one has to depend on 
heuristics and live with approximate solutions. In this paper, we explore two 
heuristics for the DCMST problem: First, we present a one-time-tree- 
construction algorithm that constructs a DCMST in a modified greedy fashion, 
employing a heuristic for selecting edges to be added to the tree at each stage of 
the tree construction. This algorithm is fast and easily parallelizable. It is 
particularly suited when the specified values for k are small — independent of n. 
The second algorithm starts with an unconstrained MST and iteratively refines 
it by replacing edges, one by one, in long paths until there is no path left with 
more than k edges. This heuristic was found to be better suited for larger values 
of k. We discuss convergence, relative merits, and parallel implementation of 
these heuristics on the MasPar MP-1 — a massively parallel SIMD machine 
with 8192 processors. Our extensive empirical study shows that the two 
heuristics produce good solutions for a wide variety of inputs. 



1 Introduction 

The Diameter-Constrained Minimum Spanning Tree (DCMST) problem can be stated 
as follows: given an undirected, edge- weighted graph G and a positive integer k, find 
a spanning tree with the smallest weight among all spanning trees of G which contain 
no path with more than k edges. The length of the longest path in the tree is called the 
diameter of the tree. Garey and Johnson [7] show that this problem is NP-complete 
by transformation from Exact Cover by 3-Sets. Let n denote the number of nodes in 
G. The problem can be solved in polynomial time for the following four special 
cases: k = 2, k = 3 , k = (n - 1), or when all edge weights are identical. All other cases 
are NP-complete. In this paper, we consider graph G to be complete and with « 
nodes. An incomplete graph can be viewed as a complete graph in which the missing 
edges have infinite weights. 
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The DCMST problem arises in several applications in distributed mutual exclusion 
where message passing is used. For example, Raymond's algorithm [5,11] imposes a 
logical spanning tree structure on a network of processors. Messages are passed 
among processors requesting entrance to a critical section and processors granting the 
privilege to enter. The maximum number of messages generated per critical-section 
execution is 2d, where d is the diameter of the spanning tree. Therefore, a small 
diameter is essential for the efficiency of the algorithm. Minimizing edge weights 
reduces the cost of the network. 

Satyanarayanan and Muthukrishnan [12] modified Raymond’s original algorithm 
to incorporate the “least executed” fairness criterion and to prevent starvation, also 
using no more than Id messages per process. In a subsequent paper [13], they 
presented a distributed algorithm for the readers and writers problem, where multiple 
nodes need to access a shared, serially reusable resource. In this algorithm, the 
number of messages generated by a read operation and a write operation has an upper 
bound of 2d and 2d, respectively. 

In another paper on distributed mutual exclusion, Wang and Lang [14] presented a 
token-based algorithm for solving the p-entry critieal-section problem, where a 
maximum of p processes are allowed to be in their critical section at the same time. If 
a node owns one of the p tokens of the system, it may enter its critical section; 
otherwise, it must broadcast a request to all the nodes that own tokens. Each request 
passes at most 2pd messages. 

The DCMST problem also arises in Linear Lightwave Networks (LLNs), where 
multi-cast calls are sent from each source to multiple destinations. It is desirable to 
use a short spanning tree for each transmission to minimize interference in the 
network. An algorithm by Bala et al [4] decomposed an LLN into edge disjoint trees 
with at least one spanning tree. The algorithm builds trees of small diameter by 
computing trees whose maximum node-degree was less than a given parameter, rather 
than optimizing the diameter directly. Furthermore, the lines of the network were 
assumed to be identical. If the LLN has lines of different bandwidths, lines of higher 
bandwidth should be included in the spanning trees to be used more often and with 
more traffic. Employing an algorithm that solves the DCMST problem can help find 
a better tree decomposition for this type of network. The network would be modeled 
by an edge-weighted graph, where an edge of weight Hx is used to represent a line of 
bandwidth x. 

Three exact-solution algorithms the DCMST problem developed by Achuthan et al 
[3] used Branch-and-Bound methods to reduce the number of subproblems. The 
algorithms were implemented on a SUN SPARC II workstation operating at 28.5 
MIPS. The algorithms were tested on complete graphs of different orders (« < 40), 
using 50 cases for each order, where edge- weights were randomly generated numbers 
between 1 and 1000. The best algorithm for A; = 4 produced an exact solution for 
« = 20 in less than one second on average, but it took an average of 550 seconds for 
n = 40. Clearly, such exact-algorithms, with exponential time complexity, are not 
suitable for graphs with thousands of nodes. 

For large graphs, Abdalla et al [2] presented a fast approximate algorithm. The 
algorithm first computed an unconstrained MST, then iteratively refined it by 
increasing the weights of (log «) edges near the center of the tree and recomputing the 
MST until the diameter constraint was achieved. The algorithm was not always able 
to produce DCMST(A:) for k < 0.05« because sometimes it reproduced spanning trees 
already considered in earlier iterations, thus entering an infinite cycle. 
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In this paper, we first present a general method for evaluating the solutions to the 
DCMST problem in Section 2. Then, we present approximate algorithms for solving 
this problem employing two distinct strategies: One-Time Tree Construction (OTTC) 
and Iterative Refinement (IR). The OTTC algorithm, based on Prim’s algorithm, is 
presented in Section 4. A special IR algorithm and a general one are presented in 
Sections 3 and 5, respectively. 



2 Evaluating the Quality of a DCMST 

Since the exact DCMST weights cannot be determined in a reasonable amount of time 
for large graphs, we use the ratio of the computed weight of the DCMST to that of the 
unconstrained MST as a rough measure of the quality of the solution. 

To obtain a crude upper bound on the DCMST(A:) weight (where k is the diameter 
constraint), observe that DCMST(2) and DCMST(3) are feasible (but often grossly 
suboptimal) solutions of DCMST(A:) for all A; > 3. Since there are polynomial-time 
exact algorithms for DCMST(2) and DCMST(3), these solutions can be used as upper 
bounds for the weight of an approximate DCMST(^). In addition, we develop a 
special approximate-heuristic for DCMST(4) and compare its weight to that of 
DCMST(3) to verify that it provides a tighter bound and produces a better solution for 
k = A. We use these upper bounds, along with the ratio to the unconstrained MST 
weight, to evaluate the quality of DCMST(A:) obtained. 



3 Special IR Heuristic for DCMST(4) 

The special algorithm to compute DCMST(^) starts with an optimal DCMST(3), then 
replaces higher-weight edges with smaller-weight edges, allowing the diameter to 
increase to 4. 



3.1 An Exact DCMST(3) Computation 

Clearly, in a DCMST(3) of graph G, every node must be of degree 1 except two 
nodes, call them u and v. Edge (m, v) is the central edge of such a spanning tree. To 
construct DCMST(3), we select an edge to be the central edge (m, v), then, for every 
node x 'm G,x i {u, v}, we include in the spanning tree the smaller of the two edges 
(x, u) and (x, v). To get an optimal DCMST(3), we compute all such spanning trees 
— with every edge in G as its central edge — and take the one with the smallest 
weight. Since we have m edges to choose from, we have to compute m different 
spanning trees. Each of these trees requires (« — 2) comparisons to select {x, u) or 
(x, v). Therefore, the total number of comparisons required to obtain the optimal 
DCMST(3) is (« - 2)m. 
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3.2 An Approximate DCMST(4) Computation 

To compute DCMST(4), we start with an optimal DCMST(3). Then, we relax the 
diameter constraint while reducing the spanning tree weight using edge replacement 
to get a smaller-weight DCMST(4). The refinement process starts hy arbitrarily 
selecting one end node of the central edge (m, v), say u, to be the center of DCMST(4). 
Let W{a, b) denote the weight of an edge (a, b). For every node x adjacent to v, we 
attempt to obtain another tree of smaller-weight by replacing edge (v, x) with edge 
(x, y), where W(x, y) < W(x, v). Furthermore, the replacement (x, y) is an edge such 
that y is adjacent to u and for all nodes z adjacent to u and z A v, W{x, y) < W{x, z). If 
no such edge exists, we keep edge (v, x) in the tree. We use the same method to 
compute a second DCMST(4), with v as its center. Finally, we aecept the DCMST(4) 
with the smaller weight as the solution. 

Suppose there are p nodes adjacent to u in DCMST(3). Then, there are {n —p —2) 
nodes adjacent to v. Therefore, we make 2p{n — p — 2) comparisons to get 
DCMST(4). It can be shown that employing this proeedure for a complete graph, the 
expected number of comparisons required to obtain an approximate DCMST(4) from 
an exact DCMST(3) is («^ - 8« - 12)/2. 



4 One-Time Tree Construction 

In the One-Time Tree Construction (OTTC) strategy, a modifieation of Prim's 
algorithm is used to compute an approximate DCMST in one pass. Prim’s algorithm 
has been experimentally shown to be the fastest for computing an MST for large 
dense graphs[8]. 

The OTTC algorithm grows a spanning tree by connecting the nearest neighbor 
that does not violate the diameter constraint. Since such an approach keeps the tree 
connected in every iteration, it is easy to keep track of the increase in tree-diameter. 
This Modified Prim algorithm is formally described in Figure 1, where we maintain 
the following information for eaeh node u: 

• near{u) is the node in the tree nearest to the non-tree node u. 

• wnear{u) is the weight of edge (m, neariu)). 

• dist{u, l..«) is the distance (unweighted path length) from u to every other 
node in the tree if u is in the tree, and is set to -1 if m is not yet in the tree. 

• ecc{u) is the eccentricity of node u, (the distance in the tree from u to the 
farthest node) if u is in the tree, and is set to -1 if m is not yet in the tree. 

To update neariu) and wnear{u), we determine the edges that connect u to 
partially-formed tree T without increasing the diameter (as the first criterion) and 
among all such edges we want the one with minimum weight. We do this efficiently, 
without having to recompute the tree diameter for each edge addition. 

In Code Segment I of the OTTC algorithm, we set the dist{v) and ecc(v) values for 
node V by copying from its parent node near(v). In Code Segment 2, we update the 
values of dist and ecc for the parent node in n steps. In Code Segment 3, we update 
the values of dist and ecc for other nodes. We make use of the dist and ecc arrays, as 
described above, to simplify the OTTC computation. 
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procedure ModifiedPrim 
INPUT: Graph G, Diameter bound k 
OUTPUT: Spanning Tree T = {V^,E^) 

initialize : = <I> and E^, := O 

select a root node to be included in 

initialize near(u) := and wnear(u) := for every u i 

compute a next-nearest-node v such that: 
wnear(v) = MINji^{wnear{u) } 
while (|Sy| < (n— 1) ) 

select the node v with the smallest value of wnear {v) 
set := ti, U {v} and U { ( v, near ( v) ) } 

{l. set dlst{v,u) and ecc(v)} 
for u = 1 to n 

if dist{near{v) ,u) > —1 then 

dist{v,u) := 1 + dist (near ( v) , u) 
dist ( V, v) : = 0 
ecc(v) : = 1 + ecc(near(v)) 

{2. update dist {near {v) ,u) and ecc (near ( v) ) } 
dist {near {v) , v) = 1 
if ecc{near{v)) < 1 then 
ecc {near ( v) ) =1 

{3. update other nodes' values of dist and ecc] 
for each tree node u other than v or near{v) 
dist{u,v) = 1 + dist {u, near{v) ) 
ecc(u) = MAX{ecc(u) , dist (u, v) } 

{4. update the near and wnear values for other nodes in G} 
for each node u not in the tree 
if 1 + ecc {near {u)) > u then 

examine all nodes in T to determine near(u) and wnear {u) 
else 

compare wnear {u) to the weight of (u,v) . 

Fig. 1. OTTC Modified Prim algorithm 

Code Segment 4 is the least intuitive. Here, we update the near and wnear values 
for every node not yet in the tree by seleeting an edge which does not increase the tree 
diameter beyond the specified constraint and has the minimum weight among all such 
edges. Now, adding v to the tree may or may not increase the diameter. If the tree 
diameter increases, and near{u) lies along a longest path in the tree, then adding u to 
the tree by connecting it to near{u) may violate the constraint. In this case, we must 
reexamine all nodes of the tree to find a new value for near{u) that does not violate 
the diameter constraint. This can be achieved by examining eee{t) for nodes t in the 
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tree; i.e., we need not recompute the tree diameter. This computation includes adding 
a new node to the tree. 

On the other hand, if (u, neariu)) is still a feasible edge, then neatiu) is the best 
choice for u among all nodes in the tree except possibly v, the newly added node. In 
this case, we need only determine whether edge (m, v) would increase the tree 
diameter beyond the constraint, and if not, whether the weight of (m, v) is less than 
wnear{u). 

The complexity of Code Segment 4 is 0{n ) when the diameter constraint k is 
small, since it requires looking at each node in the tree once for every node not in the 
tree. This makes the time complexity of this algorithm higher than that of Prim's 
algorithm. The while loop requires (« — 1) iterations. Each iteration requires at most 
0{n) steps, which makes the worst case time complexity of the algorithm 0{n). 

This algorithm does not always find a DCMST. Furthermore, the algorithm is 
sensitive to the node chosen for starting the spanning tree. In both the sequential and 
parallel implementations, we compute n such trees, one for each starting node. Then, 
we output the spanning tree with the largest weight. 

To reduce the time needed to compute the DCMST further, we develop a heuristic 
that selects a small set of starting nodes as follows. Select the q nodes {q is 
independent of «) with the smallest sum of weights of the edges emanating from each 
node. Since this is the defining criterion for spanning trees with diameter k ^ 2 'm 
complete graphs, it is polynomially computable. The algorithm now produces q 
spanning trees instead of n, reducing the overall time complexity by a factor 0{n) 
when we choose a constant value for q. 



5 The General Iterative Refinement Algorithm 

This IR algorithm does not recompute the spanning tree in every iteration; rather, a 
new spanning tree is computed by modifying the previously computed one. The 
modification performed never produces a previously generated spanning tree and, 
thus it guarantees the algorithm will terminate. Unlike the algorithm in [2], this 
algorithm removes one edge at a time and prevents cycling by moving away from the 
center of the tree whenever cycling becomes imminent. 

This new algorithm starts by computing the unconstrained MST for the input graph 
G = (V, E). Then, in each iteration, it removes one edge that breaks a longest path in 
the spanning tree and replaces it by a non-tree edge without increasing the diameter. 
The algorithm requires computing eccentricity values for all nodes in the spanning 
tree in every iteration. 

The initial MST can be computed using Prim’s algorithm. The initial eccentricity 
values for all nodes in the MST can be computed using a preorder tree traversal where 
each node visit consists of computing the distances from that node to all other nodes 
in the spanning tree. This requires a total of 0{n ) computations. As the spanning 
tree changes, we only recompute the eccentricity values that change. After computing 
the MST and the initial eccentricity values, the algorithm identifies one edge to 
remove from the tree and replaces it by another edge from G until the diameter 
constraint is met or the algorithm fails. When implemented and executed on a variety 
of inputs, we found that this process required no more than (« -I- 20) iterations. Each 
iteration consists of two parts. In the first part, described in Subsection 5.1, we find 
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an edge whose removal can contribute to reducing the diameter, and in the second 
part, described in Subsection 5.2, we find a good replacement edge. The IR algorithm 
is shown in Figure 2, and its two different edge-replacement subprocedures are shown 
in Figures 3 and 4. We use ecc^(w) to denote the eccentricity of node u with respect to 
spanning tree T, the maximum distance from u to any other node in T. The diameter 
of a spanning tree T is given by MAX{ecc-^{u)} over all nodes u in T. 



5.1 Selecting Edges for Removal 

To reduce the diameter, the edge removed must break a longest path in the tree and 
should be near the center of the tree. The center of spanning tree T can be found by 
identifying the nodes m in 7 with ecc^(w) = rdiameter/2l, the node (or two nodes) with 
minimum eccentricity. 

Since we may have more than one edge candidate for removal, we keep a sorted 
list of candidate edges. This list, which we call MID, is implemented as a max-heap 
sorted according to edge weights, so that the highest-weight candidate edge is at the 
root. 

Removing an edge from a tree does not guarantee breaking all longest paths in the 
tree. The end nodes of a longest path in 7 have maximum eccentricity, which is equal 
to the diameter of 7. Therefore, we must verify that removing an edge splits the tree 
7 into two subtrees, subtreel and subtree!, such that each of the two subtrees contains 
a node v with ecc^^(v) equal to the diameter of the tree 7. If the highest-weight edge 
from list MID does not satisfy this condition, we remove it from MID and consider 
the next highest. This process continues until we either find an edge that breaks a 
longest path in 7 or the list MID becomes empty. 

If we go through the entire list, MID, without finding an edge to remove, we must 
consider edges farther from the center. This is done by identifying the nodes u with 
ecc-f(u) = rdiameter/2l + bias, where bias is initialized to zero, and incremented by 1 
every time we go through MID without finding an edge to remove. Then, we 
recompute MID as all the edges incident to set of nodes u. Every time we succeed in 
finding an edge to remove, we reset the bias to zero. 

This method of examining edges helps prevent cycling since we consider a 
different edge every time until an edge that can be removed is found. But to 
guarantee the prevention of cycling, we always select a replacement edge that reduces 
the length of a path in 7. This will guarantee that the refinement process will 
terminate, since we will either reduce the diameter below the bound k, or bias will 
become so large that we try to remove the edges incident to the end-points of the 
longest paths in the free. 



procedure IterativeRef inement 
INPUT: Graph G = (V,E) , diameter bound k 
OUTPUT: Spanning tree T with diameter < k 
compute MST and ecc^iv) for all v in V 
MID := O 
move := false 
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repeat 

diameter := MAX^e^{ecc^{v) } 

if MID = O then 

if move = true then 

move := false 

MID := edges (u,v) that are one edge farther from 

the center of T than in the previous iteration 

else 

MID := edges (u,v) at the center of T 

repeat 

(x,y) := highest weight edge in MID 

{This splits T into two trees: subtreel and subtree2} 
until MID = O 

OR } 

if MID = O then (no good edge to remove was found} 

move : = true 
else 

remove [x,y) from T 

get a replacement edge and add it to T 
recompute ecc^ values 

until diameter < k OR we are removing the edges farthest from 
the center of T 

Fig. 2. The general IR algorithm 

In the worst case, computing list MID requires examining many edges in T, 
requiring <9(«) comparisons. In addition, sorting MID will take 0{n log «) time. A 
replaeement edge is found in 0{n) time since we must reeompute eccentrieity values 
for all nodes to find the replacement that helps reduce the diameter. Therefore, the 
iterative process, which removes and replaees edges for n iterations, will take 0{n) 
time in the worst case. Since list MID has to be sorted every time it is computed, the 
execution time can be redueed by a constant factor if we prevent MID from becoming 
too large. This is achieved by an edge-replaeement method that keeps the tree T fairly 
uniform so that it has a small number of edges near the center, as we will show in the 
next subsection. Since MID is constructed from edges near the center of T, this will 
keep MID small. 



5.2 Selecting a Replacement Edge 

When we remove an edge from a tree T, we split T into two subtrees: subtreel and 
subtreel. Then, we select a non-tree edge to conneet the two subtrees in a way that 
reduces the length of at least one longest path in T without increasing the diameter. 
The diameter of T will be reduced when all longest paths have been so broken. We 
develop two methods, ERMl and ERM2, to find sueh replacement edges. 
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5.2.1 Edge-Replacement Method ERMl 

This method, shown in Figure 3, selects a minimum-weight edge (a, b) in G 
connecting a central node a in subtreel to a central node b in subtree!. Among all 
edges that can connect subtreel to subtreel, no other edge (c, d) will produce a tree 
such that the diameter of subtreel u subtreel u {(c, d)} is smaller than the diameter 
of subtreel u subtreel u {{a, b)}. However, such an edge (a, b) is not guaranteed to 
exist in incomplete graphs. 



procedure 
Recompute 
m, : = MIW 
: = MIW 
(a,t>) : = 



ERMl 

ecc , and ecc , for each subtree 

subtreel subtree2 

legubtreel { ^^^subtreel 1 } 

,esiibtree2 ( } 

minimum-weight edge in G that has : 



by 



itself 



AND 
= m 

Add edge (a,h) to T 
if MID = <I> OR bias = 0 then 
move : = true 



a e subtreel AND b e subtree2 
ecc (a) = m AND ecc (b) 

cmr>trfa*ai ' ' i cmHf-T-»a»a9 ' ' 



MID := <& 



Fig. 3. Edge-replacement method ERMl 

Since there can be at most two central nodes in each subtree, there are at most four 
edges to select from. The central nodes in the subtrees can be found by computing 
ecc^^btreei ecc^„btt«e 2 “ ^ach subtree, then taking the nodes v with ecc^^,,,^iv) = 
MIN{ecc^^^^^^{u)} over all nodes u in the subtree that contains v. This selection can be 
done in time. 

Finally, we set the boolean variable move to true every time we remove an edge 
incident to the center of the tree. This causes the removal of edges farther from the 
center of the tree in the next iteration of the algorithm, which prevents removing the 
edge (fl, b) which has just been added. 

This edge-replacement method seems fast at the first look, because it selects one 
out of four edges. However, in the early iterations of the algorithm, this method 
creates nodes of high degree near the center of the tree, which causes MID to be very 
large. This, as we have shown in the previous section, causes the time complexity of 
the algorithm to increase by a constant factor. Furthermore, having at most four 
edges from which to select a replacement often causes the tree weight to increase 
significantly. 

5.2.2 Edge-Replacement Method ERM2 

This method, shown in Figure 5, computes ecc^^^treei ecc^„bfee 2 values for each 
subtree individually, as in ERMl. Then, the two subtrees are joined as follows. Let 
the removed edge {x, y) have xe subtreel and ye subtreel. The replacement edge 
will be the smallest-weight edge (a, b) which (1) guarantees that the new edge does 
not increase the diameter, and (2) guarantees reducing the length of a longest path in 
the tree at least by one. We enforce condition (1) by: 
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AND 

and condition (2) by: 

ecc,„b„„,(a) < ecc,„„,„,(x) OR ecc,„,^„,(b) < ecc,„,„^,(y) . (2) 

If no such edge (a, b) is found, we must remove an edge farther from the center of the 
tree, instead. 



procedure ERM2 
recompute 

f". := (y) 



and ecc. 



for each subtree by itself 



(a,b) := minimum- weight edge in G that has: 



a e subtreel AND b e subtree2 AND ecc , (a) < m, 

subtreel 1 

and ecc,^„^^, (b) < m, AND 

(a) < OR < mj 

if such an edge (a,b) is found then 
add edge (a,b) to T 
else 

add the removed edge (x,y) back to T 
move : = true 



Fig. 4. Edge-replacement method ERM2 



Since ERM2 is not restricted to the centers of the two subtrees, it works better than 
ERMl on incomplete graphs. In addition, it can produce DCMSTs with smaller 
weights because it selects a replacement from a large set of edges, instead of 4 or 
fewer edges as in ERMl. The larger number of edges increases the total time 
complexity of the IR algorithm by a constant factor over ERMl. Furthermore, this 
method does not create nodes of high degree near the center of the tree as in ERMl. 
This helps keep the size of list MID small in the early iterations, reducing the time 
complexity of the IR algorithm by a constant factor. 



6 Implementation 

In this section, we present empirical results obtained by implementing the OTTC and 
IR algorithms on the MasPar MP-1, a massively -parallel SIMD machine of 8192 
processors. The processors are arranged in a mesh where each processor is connected 
to its eight neighbors. 

Complete graphs K^, represented by their (« x ri) weight matrices, were used as 
input. Since the MST of a randomly generated graph has a small diameter, 0(log «) 
[2], they are not suited for studying the performance of DCMST(k) algorithms. 
Therefore, we generated graphs in which the minimum spanning trees are forced to 
have diameter of (« — 1). 
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6.1 One-Time Tree Construction 

We parallelized the OTTC algorithm and implemented it on the MasPar MP-1 for 
graphs of up to 1000 nodes. The DCMST generated from one start node for a graph 
of 1000 nodes took roughly 71 seconds, which means it would take it about 20 hours 
to run with n start nodes. We address this issue by running the algorithm for a 
carefully selected small set of start nodes. 

We used two different methods to choose the set of start nodes. SNMl seleets the 
center nodes of the q smallest stars in G as start nodes. SNM2 selects q nodes from G 
at random. As seen in Figure 5, the quality of DCMST obtained from these two 
heuristics, where we chose ^ = 5, is similar. The execution times of these two 
heuristics were also almost identical. 

The results from running the OTTC algorithm with n start nodes were obtained for 
graphs of 50 nodes and eompared with the results obtained with 5 start nodes for the 
same graphs; for k = A, 5, and 10. The results compare the average value of the 
smallest weight found from SNMl and SNM2 to the average weight found from the 
OTTC algorithm that runs for n iterations. The quotient of these values is reported. 
For k = 4, the DCMST obtained using SNMl had weight of 1.077 times the weight 
from the «-iteration OTTC algorithm. The cost of SNM2-tree was 1.2 times that of 
the «-iteration tree. For k = 5, SNMl weight-ratio was 1.081 while SNM2 weight- 
ratio was 1.15. For k = 10, SNMl weight-ratio was 1.053 while SNM2 weight-ratio 
was 1.085. In these eases, SNMl outperforms SNM2 in terms of the quality of 
solutions, in some eases by as mueh as 12%. The results obtained confirm the 
theoretical analysis that predicted an improvement of 0{n) in execution time, as 
deseribed in Section 4. The execution time for both SNMl and SNM2 is 
approximately the same. This time is significantly less than the time taken by the 
«-iteration algorithm as expected. Therefore, SNMl is a viable alternative to the 
«-iteration algorithm. 
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Fig. 5. Weight of DCMST(5), obtained using two different node-search heuristics, as a multiple 
of MST weight. Initial diameter = m - 1 



6.2 The Iterative Refinement Algorithms 

The heuristic for DCMST(4) was also parallelized and implemented on the MasPar 
MP-1. It produced DCMST(4) with weight approximately half that of DCMST(3), as 
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we see in Figures 5, 6, and 8. The time to refine DCMST(4) took about 1% of the 
time to calculate DCMST(3). 

We also parallelized and implemented the general IR algorithm on the MasPar 
MP-1. As expected, the algorithm did not enter an infinite loop, and it always 
terminated within (n + 20) iterations. The algorithm was unable to find a DCMST 
with diameter less than 12 in some eases for graphs with more than 300 nodes. In 
graphs of 400, 500, 1000, and 1500 nodes, our empirical results show a failure rate of 
less than 20%. The algorithm was 100% successful in finding a DCMST with k= \5 
for graphs of up to 1500 nodes. This shows that the failure rate of the algorithm does 
not depend on what fraction of n the value of k is. Rather, it depends on how small 
the constant k is. 

To see this, we must take a close look at the way we move away from the eenter of 
the tree when we select edges for removal. Note that the algorithm will fail only 
when we try to remove edges incident to the end-points of the longest paths in the 
spanning tree. Also note that we move away from the center of the tree every time we 
go through the entire set MID without finding a good replacement edge, and we return 
to the center of the spanning tree every time we succeed. 
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Fig. 6. Quality of DCMST(IO) obtained using two different edge-replaeement methods. Initial 
diameter = « - 1 

Thus, the only way the algorithm fails is when it is unable to find a good 
replacement edge in [ diameter/2l consecutive attempts, each of which includes going 
through a different set of MID. Empirical results show that it is unlikely that the 
algorithm will fail for 8 consecutive times, which makes it suitable for finding 
DCMST where the value of A: is a constant greater than or equal to 15. The algorithm 
still performs fairly well with k =10, and we did use that data in our analysis, where 
we excluded the few cases in which the algorithm did not achieve diameter 10. This 
exclusion should not affect the analysis, since the excluded cases all achieved 
diameter less than 15 with approximately the same speed as the successful attempts. 

The quality of DCMST(IO) obtained by the IR technique using the two different 
edge replacement methods, ERMl and ERM2, is shown in Figure 6. The diagram 
shows the weight of the computed DCMST(IO) as a multiple of the weight of the 
unconstrained MST. The time taken by the algorithm using ERMl and ERM2 to 
obtain DCMST(IO) is shown in Figure 7. As expected, ERM2 out-performs ERMl in 
time and quality. In addition, ERMl uses more memory than ERM2, because the size 
of MID when we use ERMl is significantly larger than its size when ERM2 is used. 
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This is caused by the creation of high-degree nodes by ERMl, as explained in 
Subsection 4.2. 

We tested the general IR algorithm, using ERM2, on random graphs. The quality 
of the DCMSTs obtained are charted in Figure 8. Comparing this figure with those 
obtained for the randomly-generated graphs forced to have unconstrained MST with 
diameter (n — 1), it can be seen that the quality of DCMST(IO) in the graphs starting 
with MSTs of (n — 1) diameter is better than that in unrestricted random graphs. This 
is because the IR algorithm keeps removing edges close to the center of the 
constrained spanning tree, whieh contain more low-weight edges in unrestricted 
random graphs, coming from the unconstrained MST. But when the unconstrained 
MST has diameter (n —1), there are more heavy-weight edges near the center that 
were added in some earlier iterations of the algorithm. Therefore, the DCMST for 
this type of graphs will lose less low-weight edges than in unrestricted random 
graphs. 




Fig. 7. Time to reduce diameter from «— 1 to 10 using two different edge-replacement methods 
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Fig. 8. Quality of DCMST(4) and DCMST(IO) for unrestricted random graphs 

Furthermore, the weight of DCMST(4) was lower than that of DCMST(IO) in 
unrestricted random graphs. Note that the DCMST(4) heuristic approaches the 
diameter optimization from above, rather than from below. When the diameter 
constraint is small, it becomes more difficult for the general IR algorithm to find a 
solution and allows large increases in tree-weight in order to achieve the required 
diameter. The approach from the upper bound, however, guarantees the tree weight 
will not increase during the refinement process. The performance of the DCMST(4) 
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algorithm did not change much in unrestricted random graphs. Rather, the quality of 
DCMST(IO) deteriorated, exceeding the upper hound. Clearly, DCMST(4) algorithm 
provides a better solution for this type of graphs. 



7 Conclusions 

We have presented three algorithms that produce approximate solutions to the 
DCMST problem, even when the diameter eonstraint is a small constant. One is a 
modifieation of Prim’s algorithm, combined with a heuristie that reduces the 
execution time by a faetor of n (by seleeting a small constant number of nodes as the 
start nodes in the OTTC algorithm) at a eost of a small inerease in the weight of the 
DCMST. 

The seeond is an IR algorithm to find an approximate DCMST. This algorithm is 
guaranteed to terminate, and it succeeds in finding a reasonable solution when the 
diameter constraint is a constant, about 15. The third is a special IR algorithm to 
compute DCMST(4). This algorithm was found to be especially effective for random 
graphs with uniformly distributed edge weights, as it outperformed the other two in 
speed and quality of solution. This algorithm provides a tighter upper bound on 
DCMST quality than the one provided by the DCMST(3) solution. We implemented 
these algorithms on an 8192-processor, the MasPar MP-1, for various types of graphs. 
The empirical results from this implementation support the theoretical conclusions 
obtained. 
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Abstract. We eonsider algorithms for a simple one-dimensional point placement 
problem: given N points on a line, and noisy measurements of the distances be- 
tween many pairs of them, estimate the relative positions of the points. Problems 
of this flavor arise in a variety of contexts. The particular motivating example that 
inspired this work comes from molecular biology; the points are markers on a 
chromosome and the goal is to map their positions. The problem is NP-hard under 
reasonable assumptions. We present two algorithms for computing least squares 
estimates of the ordering and positions of the markers: a branch and bound al- 
gorithm and a highly effective heuristic search algorithm. The branch and bound 
algorithm is able to solve to optimality problems of 1 8 markers in about an hour, 
visiting about 10® nodes out of a search space of 10^® nodes. The local search 
algorithm usually was able to find the global minimum of problems of similar 
size in about one second, and should comfortably handle much larger problem 
instances. 



1 Introduction 



The problem of mapping genetic information has been the subject of extensive research 
since experimenters started breeding fruit flies for physical characteristics. Due to the 
small scale of chromosomes, it has been difficult to obtain accurate information on 
their structure. Many techniques relying on statistical inference of indirect data have 
been applied to deduce this information; see [1] for some examples. 

More recently, researchers have developed many techniques for estimating of rela- 
tive positions various genetic features by more direct physical means. We are interested 
in one called fluorescent in situ hybridization (FISH). In this technique, pairs of fluo- 
rescently labeled probes are hybridized (attached) to specific sites on a chromosome. 
The 2-d projection of the distance between the probes is measured under a microscope. 
Despite the highly folded state of DNA in vivo and the resulting high variance of in- 
dividual measurements, [10] shows that the genomic distance can be estimated if the 
experiment is repeated in many cells. 

Not surprisingly, if more pairs of probes are measured, and the measurement be- 
tween each pair is repeated many times, the accuracy of the answer increases. Unfortu- 
nately, so does the cost. Hence, the resulting computational problem is the following: 
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Problem: Given N probes on a line, and an incomplete set of noisy pairwise measure- 
ments between probes, determine the best estimate of the ordering and position of 
the probes. 

If the measurements were complete and accurate, the problem would be easy — the 
farthest pair obviously are the extreme ends, and the intervening points can he placed by 
sorting their distances to the extremes. However, with partial, noisy data, the problem 
is known to be NP-hard. (See [6, 5] for a particularly simple proof.) 

1.1 Previous Work 

Brian Pinkerton previously investigated solving this problem using the seriation algo- 
rithm of [3], and a branch and bound algorithm (personal communication, 6/96). The 
seriation algorithm, which is a local search algorithm, was only moderately effective. 
The branch and bound algorithm, using a simple bounding function, was able to solve 
problems involving up to about 1 6 probes. 

There has been extensive work on other algorithms to solve DNA mapping prob- 
lems, but they are based on distance estimates from techniques other than FISH, and are 
tailored to the particular statistical properties of the distance measurements. Two among 
many examples are the distance geometry algorithm of [7], based on recombination fre- 
quency data, and [2] , which investigated branch and bound, simulated annealing, and 
maximum likelihood algorithms based on data from radiation hybrid mapping. 

1.2 Outline 

We present two algorithms for finding least-squares solutions to the probe placement 
problem. One is a branch and bound algorithm that can find provably optimal solutions 
to problems of moderate, but practically useful, size. The second is a heuristic search 
algorithm, fundamentally a “hill-climbing” or greedy algorithm, that is orders of mag- 
nitude faster than the branch and bound algorithm, and although it is incapable of giving 
certifiably optimal solutions, it appears to be highly effective on this data. 

In the next section we sketch some of the more difficult aspects of the problem. 
Section 3 develops a cost function to evaluate solutions. Section 4 describes the heuristic 
search algorithm. Section 5 outlines the branch and bound algorithm. We then present 
the results of simulations of the two algorithms in Section 6. 



2 Introduction to the Solution Space 

Before explaining the development of the algorithms, it is helpful to gain some intuition 
about the solution space. Given that the data is both noisy and incomplete, the problem 
can be under-constrained and/or over-constrained. In this domain, a “constrainf ’ refers 
to a measurement between two probes (since it constrains the placement of the probes). 

An under-constrained problem instance is one in which a probe might not have 
enough measurements to other probes to uniquely determine its position. In the example 
of four probes in Figure 1, probe B has only one measurement to probe A, and so a 
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Fig.l. An example of an under-eonstrained ordering (Probe B can be placed on either 
side of probe A). A line between two probes indicates a measurement between the 
probes. 



location on either side of probe A is consistent with the data. It is also important to note 
that in all solutions, left/right orientation is arbitrary as is the absolute probe position. 

In a more extreme example, a set of probes could have no measurements to an- 
other set. In Figure 2, probes A and B have no measurements to probes C and D, and 
placement anywhere relative to probes C and D is consistent with the data. 



A B 






or 



A 






Fig. 2. Another example of an under-constrained ordering 



In the examples of Figures 1 and 2, not only are the positions not uniquely deter- 
mined, but different orderings are possible. When developing search algorithms, we 
have to be careful to recognize and treat such cases correctly. It appears that in real 
data such as from [Trask, personal communication, 1996], there are no degrees of free- 
dom in the relative positioning of probes due to the careful choice of pairs of probes to 
measure. However, under-constrained instances do arise in the branch and bound algo- 
rithm described in Section 5 and in any algorithm that solves the problem by examining 
instances with a reduced set of constraints. 

Due to the noise in the data, parts of a problem instance will be over-constrained. 
For example, as shown in Figure 3, if we examine three probes with pairwise mea- 
surements between them and there isn’t an ordering such that the sum of two pairwise 
measurements equals the third pairwise measurement, there will be no way to place 
the three probes on a line. In this case, the distances between the probes in any linear 
placement will unavoidably be different from the measured distances. 



12 




Fig. 3. There is no way to linearly place these probes on a line and respect all the mea- 
surements. 
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Given the existenee of over and under-eonstrained problems, it is necessary to de- 
velop a method of evaluating how well a solution conforms to the data. This is covered 
in Section 3. Once we define how to evaluate a solution, we will develop algorithms to 
search for the best solution. 



3 How to Evaluate a Probe Placement 



We construct a cost function to evaluate the “goodness” of a solution, then solve the 
problem by finding the answer that has the least cost. Let N be the number of probes, 
Xi be the assigned position of probe i, dij be the measured distance between probe i 
and probe j (dij = dji), and let Wij = Wji be a nonnegative weight associated with the 
measured distance between probes i and j. We define the cost of this placement to be 
the weighted sum of squares of the differences between the measured distance between 
two probes and the distance between the probes in the given linear placement of the 
probes: 



Cost(xi,...,XN) = ^ Wij{\ Xi - Xj \ -dij)'^. (1) 

i<j 

dij measured 

Many subsequent formulae will be simplified by assuming Wij = 0 if i = j or if the 
distance dij has not been measured. For example, we could have omitted the qualifier 
“dij measured” from Equation 1 under this assumption. 

Intuitively, the weight Wij reflects the relative confidence we have in measurement 
dij . For example, if the measurement errors were independent normal random variables, 
then we should choose the weight Wij to be proportional to where afj is the 

variance of dij. Least squares solutions under these assumptions have several desirable 
properties, like being unbiased maximum likelihood estimators. Even though the error 
distribution in our motivating problem violated these assumptions, choosing weights 
inversely proportional to the variances substantially improved the solution quality (and 
speed) of our algorithms; see [9]. 



3.1 Finding Least Sqnares Solutions for a Fixed Ordering 

One standard approach to solving a least squares problem is to take the partial deriva- 
tives of the cost with respect to each of the Xi ’s, set them equal to 0, and solve. Unfortu- 
nately, our cost function is not differentiable due to the absolute value terms. Flowever, 
for a given fixed ordering of the probes we can bypass this difficulty, allowing us to find 
the placement which minimizes cost for the given ordering. Without loss of generality, 
assume 



Xi < X2 < ■ ■ ■ < xn- 



( 2 ) 




36 



Joshua Redstone and Walter L. Ruzzo 



Then for a given probe k : 






\i<j j l<i<k-l 



^ ^ ‘^^kii.^i ^k ^ki) ■ ( 3 ) 

fc+l<i<7V 



Separating the terms and setting equal to 0, we get for g|-: 



Xk 'Yl + X! ^ (Wikdik) - ^ (Wkidki). (4) 

l<i<7V l<i<N l<i<k-l k+l<i<N 

Equation 4 is of the form 



Mx = r 

where x is the vector of Xi’s, M is the matrix defined as: 

/• 

* 



Mi, 



Win 



Si<p<jv(^tp) * — ii 



( 5 ) 

( 6 ) 



and r is the vector whose /c* component rk is given by the right hand side of Equation 4. 
Thus, in matrix form, Equation 4 can be written as: 



/ 

— Wkl ■ ■ ■ Mkk ■ ■ ■ —WkN 

V 



\ 


(■■■) 






Xk 




/ 







i: 



Ki<k -1 



{Wikdik) 



\ 

^ ^ iMkidki) 

k+l<i<N 



J 



where Mkk, the summation term in Equation 6, is the sum of the weights of the mea- 
surements from probe k to other probes. 

A critical point is that there is no guarantee that the ordering of the probes in the 
solution of Mx = r will respect the ordering (2) used to construct this linear system. 
However, the solution to this linear system provides useful information in either case. 

- If the solution does respect the ordering, then it provides the optimal (in the least- 
squares sense) positioning of the probes with respect to the given ordering, and is a 
local minimum of the cost function. 

- If the solution does not respect the ordering, then it gives a lower bound on the cost 
of the best placement with that ordering. This is true since solution to Mx = r gives 
the minimum of Wij (xj — Xi — dij)'^ over all x, which is certainly no greater 
than the minimum over the region {x | xi < a ;2 < • • • < xn}. Furthermore, in this 
case the given ordering is not the optimal one, since the solution to Mx = r gives 
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an ordering having a lower eost. This holds sinee for each pair i < j for whieh 
Xi > Xj, we have 



(I Xi Xj I ) — {^Xi Xj ^ij') ^ dij ) ■ 

In other words, at the point x solving Mx = r, each term in the true eost function 
is less that or equal to the eorresponding term in the restricted eost function built 
assuming the ordering xi < X 2 < ■ ■ ■ < xn, and so that ordering cannot be 
optimal. 

These are the key observations on which our algorithms are built. The problem has 
been redueed from a continuous optimization problem to a discrete one — that of com- 
puting the matrix solution over all probe orderings and choosing the ordering with the 
lowest eost. Our branch and bound algorithm searches over all possible probe orderings, 
using an extension of the method above to bound the cost of large sets of possible order- 
ings, provably finding the one(s) of minimum cost. The branch and bound algorithm is 
described more fully in Seetion 5. Our heuristic search algorithm is even simpler. Start- 
ing from many random orderings, it merely iterates the process deseribed above until 
it reaches a local minimum. Empirically, this is highly effective at finding the global 
minimum quickly. This is described more fully in the next section. 



4 Heuristic Search 



As outlined in the previous section, solution to the linear system eonstructed for any 
fixed order tt of the probes either gives the optimal placement for probes in that order, 
which is a local minimum of the cost function, or gives a placement with another order- 
ing tt' at whieh the cost function is lower than it is at any placement respecting tt. Our 
heuristic search algorithm is simply “iterated linear solve”: 

1. choose a random ordering 7 t; 

2. set up the linear system corresponding to that ordering; 

3. solve it; 

4. if the resulting order tt' is equal to tt, record that as a potential minimum; 

5. ifir'y^TT, replace tt by tt' and return to step 2. 

Finally, we repeat this entire process for many random initial orderings, and report the 
lowest cost solution found. In different tests, we either did a fixed number of random 
starts, usually 300, or repeated until the known optimal solution was found. 

One nice feature of the matrix formulation is that M is independent of the ordering 
of the probes. When solving this system by LU decomposition (as in [8]), this means 
that once we perform an initial 0{N^) operation on M, we can find a solution in 0{N“^ ) 
time per ordering, the time required to generate the (order-dependent) vector r and 
backsolve. 
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5 Branch and Bound 

Our branch and bound algorithm constructs a search tree over probe orderings. The 
leaves will be eomplete orderings and the interior nodes will be partially speeified or- 
derings. There are two basie approaches to structuring the search tree. In the first ap- 
proach, shown in Figure 4, the children of a node P in the tree will be the ordering 
of probes at node P augmented with a new probe in all possible positions among the 
probes ordered at P. 



A 




CAB ACB ABC CBA BCA BAC 



Fig. 4. At a node, the ehildren are or- 
derings in whieh an additional probe is 
placed in all possible positions with re- 
spect to the ordered probes. 




AB AC BA BC CA CB 



ABC ACB BAC BCA CAB CBA 

Fig. 5. At a node, the ehildren are or- 
derings in which each of the unordered 
probes has been placed to the right of 
the rightmost ordered probe. 



For the second approach, in Figure 5, the ordering of a child of an interior node P 
will be the ordering of P augmented by a probe placed adjacent to the rightmost ordered 
probe in P. 

In either approaeh, as is typical in branch and bound algorithms, little of the ordering 
is specified at higher levels of the search tree, henee the bounds eomputed there will be 
weak and pruning will be rare. Given this, the first approach has the advantage that 
the branching factor is much lower near the root of the tree compared to the second 
approach, e.g. 3 versus — 3 on the third level. On the other hand, the second approach 
has the advantage that more information is known about the partially speeified ordering 
at an interior node P, namely that all unordered probes lie to the right of the rightmost 
specified probe in every node of the subtree rooted at P. We can exploit this to give 
a strengthened bound at internal nodes compared to approach one. In our experiments 
[1 1], approach two outperformed approach one by nearly a factor of two both in run 
time and in number of tree nodes visited. Throughout the remainder of this paper, we 
will only consider approach two. 

Our branch and bound algorithm searches through nodes in a tree, pruning a node 
if its cost is greater than the lowest cost found in a leaf node so far. At a leaf node in 
the tree, we compute the cost of the ordering as described in Section 3.1. At an interior 
node, the cost function must be a lower bound on the cost of all nodes in the subtree 
to allow us to possibly prune the subtree. In this section, we describe a simple cost 
function based on least squares. 

Consider an interior node such as that in Figure 6. In this picture of an interior node, 
the circles represent probes, and the edges represent the existence of a measurement 
between two probes. Probes A, B, C, and D have been ordered (in that order). Probes E 
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Unordered 

Ordered Probes Probes 




Fig. 6. An Interior Node 



and F are unordered with respect to each other, but both will appear to the right of probe 
D. One way of computing a lower bound on the cost for this node is to consider only the 
measurements between the ordered probes. In this case, we compute the cost function 
by computing the matrix solution (as described in Section 3.1) using a matrix built from 
only measurements between the ordered probes. This is done by simply pretending the 
other measurements do not exist, i.e., the terms in M of Equation 5 for measurements 
that we are not considering are 0, and there is no contribution from them in the r vector. 

We note that the cost function described here is ineffective at high levels in the tree 
(where nodes will reflect probe orderings with few constraints). In particular, the cost 
function described evaluates to zero for the first and second level in the tree (when only 
one or two probes are ordered). However, consider the measurements in Figure 6 be- 
tween ordered probes C, D, and unordered probe F. Even though the position of F is 
undetermined with respect to E, we know that F will be to the right of D. This al- 
lows us to remove the absolute value sign in the sum of squares terms of Equation 1 
for the measurements between F and C, D and include these terms in the cost function 
computation. Thus, for an interior node, as well as considering all edges between or- 
dered nodes, we can consider edges between ordered nodes and unordered nodes when 
constructing the cost function for the node. This improvement potentially allows us 
to compute a non-zero cost function for nodes as high as the second level in the tree 
(when only two probes are ordered). With this improvement, the only constraints we 
are not considering at a node are those between unordered probes. The bound function 
described here is the one we use in the simulations reported in Section 6, Results. 

The cost of an interior node P computed in this way will be a lower bound on the 
cost of all nodes in the subtree rooted at P, since nodes in the subtree impose additional 
constraints on the ordering, never remove constraints, and each additional constraint 
adds additional non-negative terms to the cost function. 

An additional issue which has a strong effect on the performance of our branch and 
bound algorithm is initialization of the bound. Starting the algorithm with a conservative 
default value for the bound (like Too) results in very poor pruning until a reasonably 
good solution is encountered. Instead we first run the local search algorithm from a 
few random starting orderings. Empirically, this will quickly locate a good solution, 
facilitating good pruning from the beginning. In our experiments, branch and bound 
removes 100-1000 times as many nodes as a result [9]. 

There is one remaining detail to be specified — we need to modify the construc- 
tion of the M of Equation 5. As it stands, the linear system Mx = r of Seetion 3.1 is 
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under-constrained (the rank of the null-space of M is non-zero). Because the system is 
constructed from relative orderings between the probes, there is one degree of freedom: 
the absolute position of the probes. This is remedied by modifying the system to arbi- 
trarily place probe 1 at location x\ = 0. There may be additional degrees of freedom 
in the solutions. In particular, at high levels in the tree the small set of ordered probes 
may be partitioned into several disconnected components whose relative positions are 
unconstrained. These situations are handled similarly; see [9] for details. 

Finally, we remark that the cost computed by the techniques outlined above is a 
lower bound, but not necessarily an attainable bound, on the cost of any ordering con- 
sistent with that specified at a search tree node. In particular, in the case where the 
solution to the linear system Mx = r exhibits a different ordering than the one from 
which the system was constructed, we know that the bound is not attainable by the de- 
sired ordering. It is still valid to use this bound to prune the search tree, since we know 
the bound is attainable by some (other) ordering. However, pruning could be improved 
if a higher lower bound could be computed in these cases. One possible approach to 
doing so would be to use quadratic programming — minimization of the quadratic ob- 
jective function in Equation 1 subject to the linear constraints in Equation 2 is a convex 
quadratic optimization problem, for which polynomial time algorithms are known; see, 
for example, [4] . However, it is not clear whether the increased pruning efficiency would 
offsef the extra computational cost of using the more elaborate quadratic programming 
algorithm. Preliminary experiments have been inconclusive [11] 

We now present the results of experiments performed on the heuristic search and 
branch and bound algorithms. 



6 Results 

We ran multiple simulations to assess the performance of the two algorithms and also to 
gauge the sensitivity of the algorithms to different parameters. We summarize the main 
results here; see [9, 11] for further details. 

The experiments described below were all run on synthetic data generated in ac- 
cord with the motivating problem presented in Section 1 . Probes were placed uniformly 
at random, except that adjacent probes were separated by minimum distance of ap- 
proximately 3% of the average spacing. Approximately 50% of the probe pairs were 
“measured,” were measurement consisted of drawing a random sample from a certain 
distribution whose mean was the actual distance between the probes. Data sets having 
more than one connected component or certain other anomalies were filtered out. The 
results do not seem to be overly sensitive to any of these parameters. 

As a measure of the quality of the solution found by the algorithms, we used RMS 
error — the square root of the mean squared difference between the true and calculated 
positions of the probes. While this quantity varied from run to run, the median value was 
10%-15% of the average interprobe distance, which is reasonably good considering the 
variance of the “measurements.” 

We present the total time for the branch and bound algorithm using weighted least- 
squares in Figure 7, and the total time for heuristic search in Figure 8. Each point in the 
graph is a problem instance. 




Time Taken For 300 Random Starts Seconds per Problem Instance 
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Fig. 7. Time for Branch and Bound (Seconds). 
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Fig. 8. Time for Heuristic Search. Trials showing identical times spread horizontally for 
clarity. (300 random starts; Milliseconds). 
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We can see that the ranning time of the branch and bound algorithm is exponential, 
as is expected, with time increasing roughly as 2.8^. Note that at the far right of the 
graph, the most time taken to solve a problem of 1 8 probes was about 70 minutes. Since 
the number of nodes in a search tree of a problem that size is around 10^®, we can see 
that the pruning heuristic is quite effective; in fact it visited on the order of 10® nodes. 

The performance of heuristic search is in some ways more difficult to assess. For a 
problem size of 18, the 300 random starts of heuristic search took about 1 second. The 
surprisingly stable growth rate also appears to be exponential, but grows much more 
slowly, roughly as 1.2^. At this rate, problems of size 30 would be solvable in a few 
minutes and problems of size 50 in under an hour. However, note that 300 random starts 
is a very arbitrary choice. In most trials (> 90%), the method finds the globally optimal 
solution within 10 random starts. In a few “hard core” cases, however, it can take several 
thousand starts to find the global. Unfortunately, of course, using heuristic search alone, 
one cannot tell when the globally optimal solution has been reached. (We compared to 
the provably optimal results from branch and bound.) Nevertheless, the method seems 
to be a powerful one and worth further study. 

Timing experiments where performed on a 100 MHz DEC AlphaStation 200 4/100 
with 96MB of memory. The C code was not optimized beyond the optimizations de- 
scribed here (and in [9, 11]). In particular, the LU decomposition routine was copied 
without modification from [8]. Since the process size for these algorithms was around 3 
MB, and since the simulation code is CPU intensive, the time due to non-CPU activities 
(such as paging) does not significantly affect the results shown. 

7 Conclusions 

We have presented two search algorithms, a branch and bound algorithm and a heuristic 
local search algorithm, both of which attempt to minimize a weighted least-squares cost 
function to solve a one dimensional point placement problem. 

Due to the exponential nature of the branch and bound algorithm, it is unlikely that 
it will scale to larger problem sizes. However, it does provide good performance on 
problems of 18-20 probes, large enough to be of practical use. Since it finds the global 
minimum, it is also useful as a benchmark against which to compare other algorithms. 

The local search algorithm performed surprisingly well, finding optimal solutions 
in seconds and appears capable of handling much larger problem instances. 
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Abstract. We present certain dualities occuring in problems of design 
of virtual path layouts in ATM networks. We concentrate on the one- 
to-many problem on a chain network, in which one constructs a set of 
paths, that enable connecting one vertex with all others in the network. 
We consider the parameters of load (the maximum number of paths that 
go through any single edge) and hop count (the maximum number of 
paths traversed by any single message). Optimal results are known for 
the cases where the routes are shortest paths and for the general case 
of unrestricted paths. These solutions are symmetric with respect to the 
two parameters of load and hop count, and thus suggest duality between 
these two. 

We discuss these dualities from various points of view. The trivial one 
follows from corresponding recurrence relations. We then present var- 
ious one-to-one correspondences. In the case of shortest paths layouts 
we use binary trees and lattice paths (that use horizontal and vertical 
steps). In the general case we use ternary trees, lattice paths (that use 
horizontal, vertical and diagonal steps), and high dimensional spheres. 
These correspondences shed light on the structure of the optimal solu- 
tions, and simplify some of the proofs, especially for the optimal average 
case designs. 



1 Introduction 

In this paper we study path layouts in ATM networks, in which pairs of nodes 
exchange messages along pre-defined paths in the network, termed virtual paths. 
Given a physical network, the problem is to design these paths optimally. Each 
such design forms a layout of paths in the network, and each connection between 
two nodes must consist of a concatenation of such virtual paths. The smallest 
number of these paths between two nodes is termed the hop count for these 
nodes, and the load (or congestion) of a layout is the maximum number of vir- 
tual paths that go through any (physical) communication line. The two principal 
parameters that determine the optimality of the layout are the maximum con- 
gestion of any communication line and the maximum hop count between any two 
nodes. The hop count corresponds to the time to set up a connection between 
the two nodes, and the congestion measures the load of the routing tables at the 
nodes. 

Two problems that have been recently studied are the one-to-all (or broad- 
cast) problem (e.g., [CGZ94, GWZ95, G95, DFZ97]), and the all-to-all problem 
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(see, e.g., [CGZ94, G95, KKP95, SV96, ABCRS99, DFZ97]), in which one wishes 
to measure the hop count from one specified node (or all nodes) in the network 
to all other nodes. In what follows we always consider the one-to-all problem. 

Given bounds on the load C and the hop count 7i between a given node 
termed root and all the other nodes in a layout, we look for the maximum number 
of nodes for which such a solution exists, satisfying these bounds. Considering 
a chain network, where the leftmost vertex has to be the root, and where each 
path traversed by a message must be a shortest path, a family of ordered trees 

%hort{^ ,'H ) was presented in [GWZ95], within which an optimal solution can 

r - l 7 -/ 

be found, for a chain of length A/”, with A/” bounded by ( £ ). This number, 
which is symmetric in 7i and C , is also equal to the number of lattice paths from 
(0,0) to {£ ,H), that use horizontal and vertical steps. Optimal bounds for this 
shortest path case were also derived for the average case , which also turned out 
to be symmetric in 7i and £ . 

Considering the same problem but without the shortest path restriction, 
termed the general path case, a family of tree layouts T{C ,TC ) was introduced 
in [DFZ97], for a chain of length A/” , not assuming that the root is located at its 

leftmost vertex, and with M bounded by ^ )(^ ) [GW70] . This 

number, which is also symmetric in 7i and £ , is equal to the number of lattice 
points within an £ -dimensional /i-Sphere or radius 7i , and is also equal to the 
number of lattice paths from (0,0) to {£ ,H), that use horizontal, vertical or 
(up-) diagonal steps. 

As a consequence, the trees T{C ,TC ) and T{TC ,C ) have the same number 
of nodes, and so do the trees Tghorti^ ,'H ) and Tghorti'H , C ) . In this paper 
we use one-to-one correspondences, using binary and ternary trees, in order to 
combinatorially explain the duality between these two measures of hop count 
and load, as reflected by these above symmetries. These correspondences shed 
more light into the structure of these two families of trees, allowing to find for 
any optimal layout with A/” nodes, load £and minimal (or minimal average) hop 
count 7i , its dual layout, having A/” nodes, maximal hop count £and minimal 
(or minimal average) load 7i , and vice-versa. Moreover, they give one proof for 
both measures, whereas in the above-mentioned papers these symmetries were 
only derived as a consequence of the final result; we note that the average-case 
results were derived by a seemingly-different formulas, whereas the worst-case 
results were derived by symmetric arguments. In addition, these correspondences 
also provide a simple proof to a new result concerning the duality of these two 
parameters in the worst case and the average case analysis for the general path 
case layouts. Finally, it is shown that an optimal worst case solution for the 
shortest path and general cases, is also an optimal average case solution in both 
cases, allowing a simpler characterization of these optimal layouts. 

This paper surveys results from various papers. In Section 2 the ATM model 
is presented, following [CGZ94]. In Section 3 we discuss the optimal solutions; 
the optimal design for the shortest path case follows the discussion in [GWZ95] , 
and the optimal design for the general case follows the discussion in [DFZ97, 
F98]. We encounter the duality of the parameters of load and hop count, which 
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follows via recurrence relations. In Section 4 we describe the use of binary 
and ternary trees to shed more direct light on these duality results; this follows 
[DFZ97, F98]. Lattice paths and spheres are then used in Section 5 to supply 
additional points of view for these dualities, following ( [DFZ97, F98]). We close 
with a discussion in Section 6. 

2 The Model 

We model the underlying communication network as an undirected graph G = 
{V,E), where the set V of vertices corresponds to the set of switches, and the 
set E of edges corresponds to the physical links between them. 

Definition 1. A rooted virtual path layout {layout for short) W is a collection 
of simple paths in G, termed virtual paths ( VPs for short), and a vertex r e V 
termed the root of the layout (denoted root{]h)). 

Definition 2. The load £(e) of an edge e e E in a layout 'E is the number of 
VPs tp eE that inelude e. 



Definition 3. The load CmaxiE) of a layout E is maxe g£;£(e). 



Definition 4. The hop count H{v) of a vertex v e V in a layout E is the 
minimum number of VP s whose coneatenation forms a path in G from v to 
root{E). If no such VPs exist, defineTC{v) = oo. 



Definition 5. The maximal hop count of E is TLmaxiE) = max^, ev{H{v)). 

In the rest of this paper we assume that the underlying network is a chain. 
We consider two cases: the one in which only shortest paths are allowed, and the 
second one in which general paths are considered. 

To minimize the load, one can use a layout E which has a VP on each 
physical link, i.e., CmaxiE) = 1, however such a layout has a hop count of 
Af — 1. The other extreme is connecting a direct VP from the root to each 
other vertex, yielding Umax = 1, but then Cmax = M — 1. For the intermediate 
cases we need the following definitions. 

Definition 6. Hopt{Af ,C) denotes the optimal hop count of any layout E 
on a chain of M vertices such that Cmax{E) < C, i.e., Hopt{Af ,C) = 

^i^]p{Hmax{E) '. CmaxifE) G C) . 

Definition 7. Copt{Af , H ) denotes the optimal load of any layout E on a ehain 
of Af vertiees such that Hmax{E) < TL , i.e., Copt{Af ,H ) = minq^ {Cmax{E) : 
nmax{E)<n}. 
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Definition 8 . Two VPs constitute a crossing if their endpoints li,l2 md ri,r-2 
satisfy l\ < L2 < r\ < r2- A layout is called crossing-free if no pair of VPs 
constitute a crossing. 

It is known ([GWZ95, ABCRS99]) that for each performance measure {Cmax, 
himax, dlavg, htavg) there exists an optimal layout which is crossing- free. In the 
rest of the paper we restrict ourselves to layouts viewed as a planar (that is, 
crossing-free) embedding of a tree on the chain, also termed tree layouts. There- 
fore, when no confusion occurs, we refer to each VP in a given layout T an edge 
of T. 

A/’short(>C,7f) denotes the length of a longest chain in which one node can 
broadcast to all others, with at most H hops and a load bounded by £ , for the 
case of shortest paths. The similar measure for the general case is denoted by 
Af(£,H). 

3 Optimal Solutions and Their Duality 

In this section we present the optimal solutions for layouts, when messages have 
to travel either along shortest paths or general paths. We’ll show the symmetric 
role played by the load and hop count, and explain it via the corresponding 
recurrence relations. 



3.1 Optimal Virtual Path for the Shortest Path Case 

Assuming that the leftmost node in the chain has to broadcast to each node to 
its right, it is clear that, for given 7i and £ , the largest possible chain for which 
such a design exists is like the one shows in Fig. 1. 




Recall that Alshort (£ fH ) is the length of the longest chain in which a design 
exists, for a broadcast from the leftmost node to all others, for given parameters 
Ti and £ . Alshort (£ ,T~L ) clearly satisfies the following recurrence relation: 



Xshort(0,H) = M.hort(£,0) = 1 VW,£>0 (1) 

Xshort(£,H)=AI.hort(£,H -l)+Al.hort(£ -1,H) VW,£ >0 . 




48 



Shmuel Zaks 



It easily follows that 



Mshort(/:,H)= (2) 

This design is clearly symmetric in 7i and C , which establishes the first result 
in which the load and hop count play symmetric roles. 

Note that it is clear that the maximal number of nodes in a chain , 
A/”short(>C, H) , to which one node can broadcast using shortest paths, satisfies 

D - - ® 

The above discussion, and Fig. 1, clearly give rise to the trees Tshorti^ ,'H ) 
defined as follows. 



A4hort(£,H) =2 



Definition 9. The tree layout Tshort{hl ,Tl) is defined recursively as follows. 
%ihort{^ ,0) (ind Tshort{0,Td) are tree layouts with a unique node. Otherwise, 
the root of a tree layout Tshort {C ,7i) is the leftmost node of a Tghort (>C — 1,H) 
tree layout, and it is also the leftmost node of a tree layout Tshort{h^ ,H — 1) 

Using these trees, it is easy to show that Cmax (Tshort ,TC )) = U and 
TCmax {Tshort ,H)) = H . The following two theorems follow: 

Theorem 1. Consider a chain of Jf vertiees and a maximal load requirement 
C . Let hi be such that 

Then Hopt{MX ) = n. 



Theorem 2. Consider a chain of JC vertices and a maximal hop requirement 
hi . Let C be such that 

CTU-l\ fCTU 

n ) - \ n 

Then Copt (A/” ,TL) = C, . 

Optimal bounds were also derived in [GWZ95, GWZ97] for the average case, 
using dynamic programming; the results use different recursive constructions, 
but end up in structures that are symmetric in hi and C . These results are 
stated as follows: 



Theorem 3. Let n and TL be given. Let C be the largest integer such that N > 
and let r = N — Then 

Ctot{N,n) = n(^^^^_'^'^+r{£ +1). 
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Theorem 4. Let N and C be given. Let 7i be the maximal sueh that N > 
^ ), and let r = N — . Then 

Htot{N,£) = £(^^^^^^^+r{n+l). 

3.2 Optimal Virtual Path for the General Case 

In the case where not only shortest paths are traversed, a new family of optimal 
tree layouts T{C ,'H ) is now presented. 

Definition 10. The tree layout T{C,TC) is defined recursively as follows. 
%ight{hl ,0), Tright(0,hi), Tieft{T,0) and Tieft{0,H) are tree layouts with a 
unique node. Otherwise, the root r is also the rightmost node of a tree layout 
%-ight{hl ,Ti ) and the leftmost node of a tree layout Tieft{T ,TC), when the tree 
layouts Tieft{T ,'H ) and Tright{h^ ,H ) are also defined recursively as follows. The 
root of a tree layout Ti^ft {C ,H) is the leftmost node of a Tieft {T. —1,H) tree 
layout, and it is also connected to a node which is the root of a tree layout 
'Tright{tC ~ l,H ~ 1) and a tree layout Tieft{T,H — 1) (see Fig. 2). Note that 
the root of Ti^ft {C ,H) is its leftmost node. The tree layout Tj-ight {C ,H) is de- 
fined as the mirror image of Tieft{C ,H). 




Fig. 2. Tieft{T ,H ) recursive definition 



Denote by N {C ,H ) the longest chain in which it is possible to connect one 
node to all others, with at most Tthops and the load bounded by C . From the 
above, it is clear that this chain is constructed from two chains as above, glued 
at their root. J\T {C ,H) clearly satisfies the following recurrence relation: 

Af(0,H) = Af(£,0) = 1 > 0 (4) 

N{C,H)=N{C,H -l)+N{C -1,H)+M{C -l,H -1) 'dH,£> 0. 

Again, the symmetric role of the hop count and the load are clear both from 
the definition of the corresponding trees and from the recurrence relations that 
compute their sizes. 

It is known ( [GW70]) that the solution to the recurrence relation (4) is given 
by 
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min 

i=0 




( 5 ) 



4 Duality: Binary Trees and Ternary Trees 

We saw in Section 3 that the layouts Tshort{i^ , W ) and Tshort0~l- ,^) and also 
T{C ,TC ) and T{TL ,C ) have the same number of vertices. We now turn to show 
that each pair within these virtual path layouts are, actually, quite strongly 
related. In Section 4.1 we deal with layouts that use shortest-length paths, and 
show their close relations to a certain class of binary trees, and in Section 4.2 
we deal with the general layouts and show their close relations to a certain class 
of ternary trees. 



4.1 TshortiJ^ ■>'H) and Binary Trees 

In this section we show how to transform any layout 4^ with hop count bounded 
by 7i and load bounded by £ for layouts using only shortest paths, into a 
layout 4^ (its dual) with hop count bounded by £ and load bounded by W . In 
particular, this mapping will transform Tshort{4^ ,44 ) into Tshort{Td ,£ )• 

To show this, we use transformation between any layout with x edges ( VP s) 
and binary trees with x nodes (in a binary tree, each internal node has a 
left child and/or a right child). We’ll derive our main correspondence between 

d'shorti'H ,£ ) and Tghorti^ ,H) for X = M — I, where Af = Our corre- 

spondence is done in three steps, as follows. 

Step 1 : Given a planar layout 4^ we transform it into a binary tree T = b{4'), 
under which each edge e is mapped to a node fe(e), as follows. Let e = (r, u) be 
the edge outgoing the root r to the rightmost vertex (to which there is a VP; 
we call this a 1-level edge). This edge e is mapped to the root b{r) of T. Remove 
e from 4/. As a consequence, two layouts remain: 4^i with root r and 4^2 with 
root V, when their roots are located at the leftmost vertices of both layouts. 
Recursively, the left child of node 6(e) will be b{4'i) and its right child will be 
b{4/2)- If any of the layouts 4/ is empty, so is its image b{4/) (in other words, we 
can stop when a 4/ that consists of a single edge is mapped to a binary tree that 
consists of a single vertex). 

Step 2 : Build a binary tree T, which is a reflection of T (that is, we exchange 
the left child and the right child of each vertex) . 

Step 3 : We transform back the binary tree T into the (unique) layout 4^ such 
that b(4^) = T 

Example 1. In Fig. 3 the layouts for £ = 2,H =3 and £ = 3,"R =2 are shown, 
together with the corresponding trees Tshort{‘^,^) and Tshort{3, 2), and the corre- 
sponding binary trees constructed as explained above. The edge e in the layout 
%hort{3,2) is assigned the vertex 6(e) in the corresponding tree b{Tshort{^,‘^))- 
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%hort{3,2) 

Load 3 3 2 3 2 1 3 2 1 
r'T"T"T'T T"T"T"T" 

Layouts: 



Hop count 121221222 




Tshort{‘^y‘i) 



2 2 2 1 2 2 1 2 1 




2 " " 3 " ' 1" " 2 " " 3 " " 2 ' " 3 " " 3 



T=b{Tshort{3,2)) 




b{Tshort{2,^)) 




Fig. 3. An example of the transformation using binary trees 



Given a non-crossing layout 4^, we define the level of an edge e in 4^, denoted 
levelijf{e) (or level{e) for short), to be one plus the number of edges above e in 
4^. In addition, to each edge e of the layout 4^ we assign its farthest end-point 
from the root, u(e). 

Example 2. In Fig. 3 the edge e in the layout Tshort{3,2) is assigned the vertex 
v{e) in this layout, and its level level{e) is 2. 

One of our key observations is the following theorem: 

Theorem 5. For every H and C , the trees b{Tshort{^ b{TshortO~L ,£ )) 

are reflections of each other. 

This clearly establishes a one-to-one mapping between these trees, and thus 
establishes the required duality. 

To further investigate the structure of these trees, we now turn to explore the 
properties of the binary trees that we have defined above. We prove the following 
theorem: 

Theorem 6. Given a layout 4/, let T = b{4/) be the binary tree assigned to it by 
the transformation above. Let ddf{v) (dJ^{v)) be equal to one plus the number of 
left (right) steps in the path from the root r to v, for every node v in T . Then, 
for every edge e in the layout 4/: 
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1. T~i\]/(v{e)) = dy(6(e)), and 

2. level{e) = d^{b{e)). 

Given a non-crossing layout ][', for each physical link e' we assign an edge 
(^(e') in that includes it and is of highest level (such a path exists due to the 
connectivity and planarity of the layout; see edge e' and physical edge 4>{e') in 
Fig. 3). It can be easily proved that: 

Lemma 1. Given a non-crossing tree layout W, the mapping of a physical link 
e' to an edge <p{e') deseribed above is one-to-one. 



Proposition 1. Given a non-crossing tree layout over a physical network, let 
T = b{]Ie) be the binary tree assigned to it. Then C{eJ) = level[(j)[eJ)) for every 
edge e' in the physical network. 

Given a layout over a chain network, if we consider the multiset \v G 

b{4')} we get exactly the multiset of hop counts of the vertices of this network 
(by Theorem 6), and if we consider the multiset {dff{v)\v G b{4')} we get exactly 
the multiset of loads of the physical links of this network (by Theorem 6 and 
Proposition 1). By using this and finding the dual layout with the multisets 
{d^^{v)\v G b{4')} of hop counts of its vertices and {dff(v)\v G b{dr)} of loads of 
its physical edges of we observe that the multiset of hop counts of is exactly 
the multiset of load of T', and the multiset of loads of 'T is also the multiset of 
hop counts of W, thus deriving a complete combinatorial explanation for the 
symmetric results of Section 3.1 for either the worst case trees or average case 
trees: 

Theorem 7. Given an optimal layout 4' with M nodes, load bounded by C and 
optimal hop count Hopt{J^ ,kl), its dual layout 4/ has N nodes, hop count hounded 
by C and optimal load TLopt (Af ,C). 



Theorem 8. Given an optimal layout 4/ with M nodes, hop count bounded by 
7i and optimal load Copt{J^ its dual layout 4/ has M nodes, load bounded by 
7i and optimal hop count C-opt (Af , ) . 



Theorem 9. Given an optimal layout 4/ with M nodes, load bounded by Hand 
optimal average hop eount, its dual layout 4r has M nodes, hop count bounded by 
C and optimal average load. 



Theorem 10. Given an optimal layout 4/ with M nodes, hop count bounded by 
7i and optimal average load, its dual layout 4/ has M nodes, load bounded by 
7i and optimal average hop eount. 
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4.2 'T {C ,1-1) and Ternary Trees 



We now extend the technique developed in Section 4.1 to general path case 
layouts; we show how to transform any layout with hop count bounded by 7i 
and load bounded by C into a layout 4^ (its dual) with hop count bounded by 
C and load bounded by H . In particular, this mapping will transform T{C ,TC ) 
into T{TL ,C ). 

To show this, we use transformation between any layout with x edges ( VP s) 
and ternary trees with x nodes (in a ternary tree, each internal node has a left 
child and/or a middle child and/or a right child). Our correspondence is done 
in three steps, as follows. 

Step 1 : Given a planar layout W we transform it into a ternary tree T = 
under which each edge e is mapped to a node f(e), as follows. Let e = (r, u) be 
the edge outgoing the root r to the rightmost vertex (to which there is a VP; we 
call this a 1-level edge). This edge e is mapped to the root t{r) of T. Remove e 
from d'. As a consequence,three layouts remain; d'l with root r and and with 
root V (when their roots are located at the leftmost vertices of both layouts) 
and !f '2 with root v (when v is its rightmost vertex). Recursively, the left child 
of node t{e) will be f(!?'i), its middle child will be and its right child will 

be If any of the layouts 4^ is empty, so is its image t{W) (in other words, 

we can stop when a d/ that consists of a single edge is mapped to a ternary tree 
that consists of a single vertex). 

Step 2: Build a ternary tree T, which is a reflection of T (that is, we exchange 
the left child and the right child of each vertex; the middle child does not change). 
Step 3 : We transform back the ternary tree T into the (unique) layout ]!/ such 
that t{4e) = T 

See Fig. 4 for an example of this transformation. 



-Tie ft {3, 2) 

Loads 332332123321 



^e/t(2,3) 

222122221221 






Fig. 4. An example of the transformation using ternary trees 
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One of our key observations is the following theorem: 

Theorem 11. For every Hand C, the trees t(T{£ ,H)) and t(T(H ,£)) are 
reflections of each other. 

This clearly establishes a one-to-one mapping between these trees, and thus 
establishes the required duality. 

To further investigate the structure of these trees, we now turn to explore 
the properties of the ternary trees that we have defined above. We prove the 
following theorem. Note that the definition of level (of an edge) and f (of a 
physical link) remain exactly the same as in Section 4.1. 

Theorem 12. Given a layout let T = be the ternary tree assigned to 
it by the transformation above. Let (dJ^^{v)) be equal to one plus the 

number of left and middle (right and middle ) steps in the path from the root r 
to V, for every node v in T . Then, for every edge e in the layout \F : 

1. Hxp{v(e)) = df^^ (t{e)), and 

2. levefle) = d!f^(t(e)). 

Proposition 2. Given a non-crossing tree layout T over a physical network, let 
T = be the ternary tree assigned to it. Then C{e') = level[(j)[e')) for every 
edge e' in the physical network. 

Given a layout T over a chain network, if we consider the multiset 
{v)\v G tfll/)} we get exactly the multiset of hop counts of the vertices of 
this network (by Theorem 12), and if we consider the multiset {v)\v G 
we get exactly the multiset of loads of the physical links of this network (by The- 
orem 12 and Proposition 2). By using this and finding the dual layout ]F with the 
multisets {d^^^ {v)\v G t{]T)} of hop counts of its vertices and {dff^ {v)\v G t{]T)} 
of loads of its physical edges of T, we observe that the multiset of hop counts of 
T is exactly the multiset of load of T, and the multiset of loads of T is also the 
multiset of hop counts of T, thus deriving a complete combinatorial explanation 
for the symmetric results of either the worst-case trees or average-case trees in 
the general path case. 

Following the above discussion, we obtain the exact four theorems (Theorems 
7, 8, 9 and 10) extended to the general path case layouts. 

5 Duality: Lattice Paths and High-Dimensional Spheres 

5.1 Lattice Paths 

The recurrence relation (1) clearly corresponds to the number of lattice paths 
from the point (0,0) to the point {C,H), that use only horizontal (right) and 
vertical (up) steps. 

In Fig. 5 each lattice point is labeled with the number of lattice paths from 
(0,0) to it; the calculation is done by the recurrence relation 1. For the case £ = 
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hops 




Fig. 5. Lattice paths with regular steps 



3 and 7i= 2 one gets = 10; this corresponds to the number of nodes in 

the tree Tshort{^,‘^) (see Fig. 3), and to the number of paths that go from (0,0) 
to (3,2). 

The recurrence relation (4) clearly corresponds to the number of lattice paths 
from the point (0,0) to the point that use horizontal (right), vertical (up), 

and diagonal (up-right) steps. In Fig. 6 each lattice point is labeled with the 
number of lattice paths from (0,0) to it. For the case £ = 3 and 2 one gets 
25 such paths. This corresponds to the number of nodes in the tree T(3,2) 



hops 




Fig. 6. Lattice paths with regular and diagonal steps 



(see Fig. 7), that is constructed of two trees, glued at their roots, the one depicted 
in Fig. 3 (and containing 13 vertices), and its corresponding reverse tree. 

We also refer to these lattice paths in Section 5.2. 
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Fig. 7. The tree T(3, 2) 



5.2 Spheres 

Consider the set of lattice points (that is, points with integral coordinates) 
of an £ -dimensional £-Sphere of radius 7i. The points in this sphere are £- 
dimensional vectors v = (ui, U 2 , . . . , ), where |ui| + |u 2 | + . . . + |u£ | < . Let 

\Sp{£ ,H)\ be the number of lattice points in this sphere. Let \Rad{M ,£)\ be 
the radius of the smallest £ -dimensional ^-Sphere containing at least A/” internal 
lattice points. 

It can be shown that 

Theorem 13. The tree T{£ ,H) contains \Sp{£ ,H)\ vertices. 

The exact number of points in this sphere is given by equation (5). (This was 
studied, in conection with codewords, in [GW70].) 

Moreover, we can show that 

Theorem 14. Consider a chain of JC vertices and a maximal load requirement 
£. Tften Hopt(Af,£) = |i?ad(Af ,£)|. 

This theorem is proved by showing a one-to-one mapping between the nodes 
of any layout with hop count bounded by Hand load bounded by £into the 
£ -dimensional spheres of radius TL . This mapping turns out to be very useful 
in the analysis of this and related analytical results (see also Section 6). 

Using the above correspondences and discussion, it is possible to show that, 
for either the shortest paths case or the general case, any optimal layout T with 
Af nodes, load bounded by £and optimal hop count, has also optimal average 
hop count regarding layouts with load bounded by £ , and that any optimal 
layout \I/ with Af nodes, hop count bounded by Hand optimal load, has also 
optimal average load regarding layouts with hop count bounded by H . 

We now sketch a one-to-one mapping between the set of lattice points of 
the £ -dimensional sphere of radius Hand the set of lattice paths from (0,0) 
to (£ ,H ) that use horizontal, vertical or (up-)diagonal steps. We first describe 
a function which maps every vector v = (ui , . . . , ) in SV{C , H ) into such a 

lattice path. Starting from (0, 0) make jui | vertical steps and one horizontal step, 
make |u 2 | vertical steps and one horizontal step,..., make |u£ | vertical steps and 
one more horizontal step, ending with H — horizontal steps. After that, 

for every negative V{ component of v, we replace the lu^lth vertical step and 
the subsequent horizontal step done during the translation of this component 
by an (up-)diagonal step. A close look at the properties of these paths enables 
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us to further explore the properties of these trees. Returning to the discussion 
of the layouts Tshort{J^ , ) that use only shortest paths, it is possible to find a 

similar correspondence between the vertices of these trees and lattice paths from 
(0,0) to {£ ,H) that use only vertical and horizontal steps, and to view some 
properties of these trees using these lattice paths. 

6 Discussion 

We presented the dual role played by the parameters of load and hop count in 
optimal designs of virtual path layouts in ATM chain networks, for the cases 
of shortest paths routes and the general case. We discussed these dualities with 
the aid of recurrence relations, one-to-one correspondences with binary trees (for 
the shortest paths case) and ternary trees (for the general case), lattice paths 
(that use horizontal and vertical steps for the shortest path case, and that also 
use diagonal steps for the general case), and high dimensional spheres (in the 
general case). These dualities shed light on the structure the optimal solutions, 
and simplify some of the proofs. 

It might be of interest to further explore such duality relations between these 
and corresponding parameters (such as load measured at vertices (e.g., [FNP97]) 
also for other topologies (such as trees ([CGZ94, G95]), meshes or planar graphs 
([BG97, BBGG97, G95]); one might also consult the survey in [Z97] for a general 
discssion of these extensions. 

Of special interest in the use of geometry, presented in Section 5.2. This 
approach provides a clear view for the structure of the solution, and enabled 
improving results for the all-to-all problem on a ring network (see [DFZ97] for 
details). 

Acknowledgments: I would like to thank my coauthors (Marcelo Feighelstein, 
Ori Gerstel, Avishai Wool, Israel Cidon and Yefim Dinitz), and Renzo Sprug- 
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Abstract. It is proved, sharpening previous results of Scheinerman and 
by analysing an algorithm, that the independence number of the random 
interval graph, defined as the intersection graph of n intervals whose end 
points are chosen at random on [0,1], concentrates around 2y^. 



Key Words: Random Graphs, Analysis of Algorithms, Probabilistic Methods 

1 Introduction 

Scheinerman defined [1] , a random interval graph on n vertices as the intersection 
graph of n intervals [Xi, Yi], ..., [Xi,Yi ], ..., [X„,Y„] whose end points are chosen 
at random on [0,1]. Hence here we start with 2n independent random variables 
Zi,Z 2 ,..., Z^n independently and uniformly distributed on [0, 1] and we put, for 
1 < f < n, Xi = min{Z 2 i-i, Z 2 i} and V = max{Z 2 i-i, Z 2 i}. Scheinerman 
derived many interesting properties of these graphs. Here we answer one of the 
questions that Scheinerman left open, namely we derive an asymptotic equivalent 
for the independence number of these graphs. The main ingredient in our proof 
is the analysis of an algorithm. Recall that the independence number of a graph 
is the maximum cardinality of a subset of vertices which span no edge. 

Theorem Let Qn denote the random graph defined as the intersection graph 
ofn intervals whose end points are chosen at random on [0,1]. The independence 
number a{Qn) of this graph satisfies 




in probability as n ^ oo. 

2 Proof of the Theorem 

The proof uses a greedy algorithm. Let again [Xi, Yi], ..., [X^, V], ..., [X„, Vi] 
denote our random intervals. We call Xi (resp. V) the left (resp. the right) end of 
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[Xi,Yi], We put li = [Xi, Yi], We also denote by li the vertex of Gn corresponding 
to the interval h. We say that [Xj,Yj] lies at the right of [Xi,Yi] if we have 
Xj > Yi. It is clear that we can assume that the leftmost interval representing 
a vertex of an independent set of maximum cardinality is the interval li with 
leftmost rigth end. This implies by induction that the independent set { Ji, Jh} 
of Gn given by following algorithm has maximum cardinality (here each Jk is 
equal to some h). 

Algorithm Alfin 

1. Define J\ as the interval h with leftmost right extremity and set k = 1. 

2. If there is no interval li lying at the right of Jk put a{Gn) = k and stop. 
Else define Jk+i as the interval li lying at the right of Jk which has the leftmost 
right extremity, set A: = A: + 1 and go to 2. 

This concludes the description of our algorithm. 



2.1 Some Preliminary Results 



We begin by restating for ease of reference two well known inequalities concerning 
the tail of the Binomial distribution. 

Fact l(Hoeffding-Chernoff bounds) Let Sn,p denote the sum of n {0, 1} val- 
ued independent random variables Xi,...,Xn with P[Xi = 1] = p, 1 < f < n. 
We have then, for 0 < e < 1, 

P[Sn,p < (1 - e)np] < (1) 

and, 

P{S^,P < (1 + e)np] < (2) 

We will need the following easy results concerning the distribution of the Ij ’s. 
Fact 2 Suppose that Ii, are m random intervals contained in the seg- 

ment J = [0, 1] and let x denote the largest distance between the right end of J 
and the right ends of the IJs. We have 



E{x) < ^ 




and. 



E(x^) < 



rir 



m 



Proof. The probability that the right end of I\ lies at a distance greater than x 
of the left end of J is equal to 1 — {x/l)'^. The probability that this is true for 
every interval li is thus equal to [1 — (x/Z)^]™. Hence we have 



E(x) = -[ \d[l - {t/lfr =/'[!- Wlf] 

Jo Jo 



^dt 



< 



0 

f * , , I riF 
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and 



E{x^) 




t^d[l - [t/lf] 



J 0 






e tdt < 



^tdt 

TVP 

m 



□ 

Fact 3 Let m{t) denote the number of intervals li which lie completely in 
the interval [1 — t, 1]. We have, with probability l-o(l), 



— bt\/n logn < m{t) < nt^ + 5 \/n log n, n < t < 1. 



Proof. Use the fact that m{t) is for fixed t a binomial random variable with 
parameters n and p = □ 



2.2 Analysis of the Algorithm Alfin 

We set Xo = 1 and for each f > 1 we denote by Xi the distance between the 
rightmost extremity of the interval Ji and the right extremity of [0,1]. We denote 
by rii the number of intervals Ij which lie at the right of Xi . (Here Xi and rii are 
random variables which are defined for each value of i which does not exceed 
the independence number). Let us first observe that, since the restriction of the 
uniform distribution on [0, 1] to any subinterval is again uniform, it follows that, 
conditionally on Xi and rii, the rii intervals Ij which are contained in [xi, 1] are 
independently and uniformly distributed on this interval. 

Let us denote by Bi the cr-field generated by the random variables Xo,rio, 
Xi,ni, ...,Xi,rii. Let us define 

io = max{ji' : m{xi) > nxi^ — Xi\/n log n, I < i < j} 



and jo as the last value (if any) of the index i for which the inequality Xi < 
n^i^/7 logn is satisfied. If there is no such a value we set jo = n. Let ko = 
mm{io,jo}- We define a new process [yi,m) by putting Pi = Xi i < ko and 
Vi = (= Vka ) if not. Obviously this new process is also measurable relatively 

to the family of u-fields {Bi). Since the conditional expectation of the difference 
Si+i = Vi — Vi+i is, for a given pi, a decreasing function of rii, we have, for 
z < fo — 1, using fact 2 with I = pi, m = npi^ — Pi\/n logn. 



El3i{^i+l) < 



Pi I TT 1 I 7T 

2 Y npi'^ — pi\/7n logn “ 2 y n(l — n^^O) 



and this inequality is obviously also true for i > ko since then vanishes. It 
implies that the sequence {zi) defined by 



i I TT 

^' = ^*+2Vn(l-n-i/4) 



(3) 
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is a supermartingale relatively to the familly of o-fields {Bi). Let us put I = 
2(1 log^/^ We have 

i 

Var zi < Var Hi < 

i=l 

since, by facts 2 and 3, each of the terms in this sum is bounded above by 
where C is an absolute constant. Observing that Ezi > 1 and using 
Kolmogorov’s inequality for martingales, we get 

P[zi > 1 - n-i/^log^/^n, 1 < f < d > 1 - . 

logn 

Replacing i by 1 in 3 we get the inequality yi > zi — 1 + n^^/^log^^^n which 
gives, with the preceding inequality, 

P[yi > 3n^i log^/^ n] = 1 - o(l), 

that is, with probability 1 — o(l), we have I < jo- Since we have also I < io 
with probability 1 — o(l) it follows that, again with probability 1 — o(l), the 
process {yi) coincides with the process {xi) up to time 1. This means that the 
independence number of our interval graph is at least I = 2(^)^/^(l — o(l)) and 
concludes the first part of the proof. For the second part, that is in order to 
prove that the independence is bounded above by 2(^)^/^(l + o(l)), it suffices 
to repeat essentially the same arguments, using inequalities reverse to those we 
have used. The details are omitted. 

3 Conclusion 

By analysing an algorithm, we have obtained an asymptotic equivalent to the 
independence number of a random interval graph in the sense of Scheinerman. 
An open problem is to find an asymptotic equivalent to the independence number 
of a genuine random interval graph, in the model where every possible interval 
graph on n vertices is equally likely. 
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Abstract. We consider strategies for full backups from the viewpoint 
of competitive analysis of online problems. We concentrate upon the re- 
alistic case that faults are rare, i.e. the cost of work between two faults is 
typically large compared to the cost of one backup. Instead of the (worst- 
case) competitive ratio we use a refined and more expressive quality 
measure, in terms of the average fault frequency. This is not standard in 
the online algorithm literature. The interesting matter is, roughly speak- 
ing, to adapt the backup frequency to the fault frequency, while future 
faults are unpredictable. We give an asymptotically optimal determin- 
istic strategy and propose a randomized strategy whose expected cost 
beats the deterministic bound. 



1 Introducing the Backup Problem 

The main method to protect data from loss (due to power failure, physical de- 
struction of storage media, deletion by accident etc.) is to save the current status 
of a file or project from time to time. Such a full backup incurs some cost, but 
loss of data is also very costly and annoying, and faults are unpredictable. So it 
is a natural question what competitve analysis of online problems can say about 
backup (or autosave) strategies. 

We consider the following basic model. Some file (or file system, project etc.) 
is being edited, while faults can appear. The cost of work per time is assumed 
to be constant. Every backup incurs a fixed cost as well. We may w.l.o.g. choose 
the time unit and cost unit in such a way that every unit of working time and 
every backup incurs cost 1. In case of a fault, all work done after the most 
recent backup is lost and must be repeated. Before this, we have to recover the 
last consistent status from the backup, which incurs cost R (a constant ratio of 
recovery and backup cost). The goal is to minimize the total cost of a piece of 
work, which is the sum of costs for working time (including repetitions), backups 
and recoveries. 

This seems to be the simplest reasonable model and a good starting point 
for studying online backup strategies. Perhaps the main criticism is concerned 
with the constant backup cost. Usually they depend somehow on the amount of 
changings (such as incremental backups). However, the constant cost assumption 
is also suitable in some cases, e.g. if the system always saves the entire file though 
the changings are minor updates, or if the save operation has large constant setup 
cost whereas the amount of data makes no difference. 
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Extended models may, of course, take more aspects into account: faults which 
have more fatal implications than just repeated work (such as loss of hardly 
retrievable data etc.), backups which may be faulty in turn; repeated work which 
is done faster than the first time - to mention a few. 

It is usual in competitive analysis to compare the costs of an online strategy 
to the costs of a clearvoyant (offline) strategy which has prior knowledge of the 
problem instance (here: the times when faults appear). Their worst-case ratio is 
called the competitive ratio. We remark that, for our problem, an optimal offline 
strategy is easy to establish but not useful in developing online strategies, so we 
omit this subject. 

Only a few online problems have been considered where a problem instance 
merely consists of a sequence of points in time: rent-to-buy (or spin- block), 
acknowledgement delay, and some special online scheduling problems fall into 
this category, cf. the references. 



2 Rare Faults and a Reformulation 



Consider a piece of work that requires p time units and is to be carried out 
nonstop. A time interval / of length p is earmarked for this job. Let n be the 
number of faults that will appear during I. The ratio f = n/p is referred to 
as the average fault frequency, with respect to /. In most realistic scenarios / 
is quite small compared to 1, i.e. the time equivalent of the cost of one backup 
is much smaller than the average distance between faults, thus we will focus 
attention to this case of rare faults. 

If the online player would know / in advance (but not the times the faults 
appear at), it were not a bad idea to make a backup every l/v7 time units. 
Namely, the backup cost per time unit is y/J, and every fault destroys work to 
the value of at most hence the average cost of work to be repeated is 

bounded by per time unit. The fraction of work which got lost must 
be fetched later, immediately after I. New faults can occur in this extra time 
interval, but this adds lower-order terms to the costs. Hence the cost per time 
is at most 1 + 2^/J + Rf, and the stretch (i.e. ratio of completion time and 
productive time) is 1 + ^/J, subject to 0{f) terms. 

An offline player can trivially succeed with 1 + (1 + R)f = 1 -f 0{f) average 
cost per time, making a backup immediately before each fault. (This is not 
necessarily optimal.) We shall see below that any online strategy incurs cost at 
least 1 + V7 per time unit, in the worst case. Hence the competitve ratio still 
behaves as 1 + J7(V/), for small /. This suggests to simply consider the cost per 
time incurred by an online strategy, rather than the slightly smaller competitive 
ratio. In particular, the constant R we have preliminarily introduced appears in 
0{f) only, thus we will suppress it henceforth. We also omit the summand 1 for 
normal work and merely consider the additional cost for backups and lost (i.e. 
repeated) work per time. Throughout the paper, this is called the excess rate. 
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The next simple observation shows that the average fault frequency is intrin- 
sic in the excess rate, as announced. (Similarly one can realize that no competi- 
tive online strategy exists if faults are unrestricted.) 

Proposition 1. For any fault frequency f , even if the online player knows f 
beforehand, no deterministic backup strategy can guarantee an excess rate below 

V7. 



Proof. An adversary partitions the time axis into phases of length 1 // and places 
one fault in each phase by the following rule: If there occurs a gap of 1 / ^/J time 
units between backups then the adversary injects a fault there, hence the online 
player loses 1 / \/J time, so the ratio of lost time is \/J. Otherwise, if the distance 
between consecutive backups were always smaller than 1 / ^/J then more than 
1 / V7 backups have been made, so the backup cost per time is at least VJ- In 
this case the adversary injects a fault at the end of the phase, to keep the fault 
frequency /. □ 



Thus the excess rate is some c^/J, and the main interesting matter is to adapt 
the backup frequency so as to achieve the best coefficient c, under the realistic 
assumption that the online player has no previous knowledge of the faults at all. 

Some remarks suggest that other objectives would be less interesting: (1) 
One might also study the excess rate in terms of the smallest fault distance d, 
rather than the fault frequency /. However this is not very natural, since a pair 
of faults occuring close together may be an exceptional event, and d can only 
decrease in time, so using d as a parameter would yield too cautious strategies. 
Moreover note that, trivially, any online strategy with excess rate Cy/J has the 
upper bound cfVd, too. (2) For a given strategy one may easily compute that / 
maximizing {1 + Cy/J)/{1 + {1 + R)f), thus estimating the worst-case competitive 
ratio, but this number is less meaningful than c itself. 

Clearly a C\/J upper bound can only hold in case / > 0, in other words, 
if at least one fault occurs. If / = 0 then already the first backup yields an 
infinite coefficient, but if the online player makes no backups at all, speculating 
on absence of faults, the adversary can foil him by a late fault, which also yields 
a coefficient not bounded by any constant. 

An elegant viewpoint avoiding this / = 0 singularity is the following refor- 
mulation of the problem. Consider a stream of work whose length is not a priori 
bounded. Given n, what c can be achieved for the piece of work up to the n-th 
fault? (If the n-th fault appears after p time units, / is understood to be n/p.) 
We refer to the corresponding strategies as n-fault strategies. This version of our 
problem is also supported by 

Proposition 2. If we partition, in retrospect, the work time interval arbitrarily 
into phases each containing at least one fault, such that we have always achieved 
an excess rate C\/Ji, where fi is the fault frequency in the i-th phase, then the 
overall excess rate is bounded by Cy/J. 
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Proof. Let the i-th phase have length pi and contain > 0 faults. The cost 
of i-th phase is by assumption PiC^Jrii/pi = Cy/nfpi. Exploiting an elementary 
inequality, the total cost is bounded by 



'^Pi = c^/np = pcy^. 
i i \ i 

□ 






^/riiPi < c 






Therefore, once we have an n-fault strategy with excess rate c\Tf, we may 
apply it repeatedly to phases of n faults each, thus keeping an overall excess rate 

cv7- 

Let us summarize this section: Instead of the competitive ratio we use a 
nonstandard measure for online backup strategies in terms of an input parameter 
(fault frequency /), called the excess rate. It behaves as cVJ and is much more 
expressive than a single number for the worst case, as the extra costs heavily 
depend on the fault frequency. Moreover, we consider an arbitrarily long piece 
of work until the n-th fault appears, and we try to minimize c for given n. 

In the following sections we develop concrete n-fault strategies. In the proofs, 
we first consider a sort of continuous analogue of the discrete problem. Doing so 
we can first ignore tedious technicalities and conveniently obtain a heuristic so- 
lution which is then discretized. The bound for the discrete strategy is rigorously 
verified afterwards. 

3 Deterministic n-Fault Strategies 

We first settle case n = 1. 

Theorem 1. There exists a deterministie 1-fault strategy with c = VS, and this 
is the optimal constant coefficient. 

Proof. Work begins w.l.o.g. at time 0. A backup strategy is specified by the 
integer-valued function y{x) describing the number of backups made before time 
x'^. (This quadratic scale will prove convenient.) In order to get a heuristic so- 
lution, we admit differentiable real functions y instead of integer-valued ones. 
That is, we provisionally fix the asymptotic growth of backup numbers in time 
only, but not the particular backup times. 

We have to assign suitable costs to such functions. Assume that the fault 
occurs between time (a: — 1)^ and x“^ . Then the cost of backups and lost work 
incurred so far is bounded by y{x) + 2xjy'{x). Namely, at most y{x) backups 
have been made, and, in the worst case, the fault appears immediately before 
a planned backup, hence the lost time may equal the distance of consecutive 
backups. This distance can be roughly estimated as 2xly'{x), since y'{x) is the 
backup density on the quadratic scale, and x^ — {x — 1)^ < 2x. 

Remember that the excess rate cVJ is the cost per time. Since / = l/x^, the 
coefficient c is the cost divided by x. So our y and c must satisfy y{x)+2x/y'{x) < 
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cx for all X. We can assume equality, since every backup may be deferred until 
coefficient c is reached. The resulting differential equation y + 2x/y' = cx with 
y(0) = 0 has the solution y = ax with suitable constant a. Substitution yields 
a + 2/a = c. The optimal c = -s/8 is achieved with a = \/2. 

Translating this back, let us make the x-th backup at time x'^ /2. Is is not 
hard to verify accurately that this strategy has excess rate bounded by \/8y/J: 
Let the fault appear at time with x‘^/2 < v? < (x + 1)^/2. The coefficient of 
v/f at this moment is obviously 

X + v? — x^ j2 

c = . 

u 

This term is monotone increasing in u within this interval, so we may consider 
u'^ = [x + 1)^/2, implying 



□ 



c 



V2. 



2a; + 1/2 

a; + 1 



< 2V2. 



Note that optimality holds only in an asymptotic sense, i.e. for / ^ 0. The 
coefficient is minus some term vanishing with /. It might be interesting to 
analyze this lower-order term, too. 

Next we extend the idea to n faults. Here the coefficient improves upon the 
1-fault optimum, if we combine n single-fault phases appropriately: Note that 
the inequality used in Proposition 2 is tight for equal-length phases only, so it 
should be possible to beat by adapting the backup frequency. A more intuitive 
explanation of this effect is that the online player learns, with each fault, more 
about the parameter / which describes an average behaviour in time. 

Lemma 1. Any deterministic n-fault strategy with exeess rate Cn\/J yields a 
deterministic (n + 1) -fault strategy with exeess rate 



Cn+1 



cjn + 2 
c„Vr?Tn 



//• 



Proof. Apply the given n-fault strategy up to the n-th fault which occurs, say, 
at time z'^. With c := c„, the cost of backups and lost time until the n-th 
fault is C\fjz^ = c^/nz. Let y{x) denote the number of further backups until 
time {z + x)^. Assuming that the (n + l)-th fault appears at time {z + x)'^ and 
allowing for differentiable real functions y, the total cost up to this moment is 
bounded by 

c^/nz -L y{x) + 2{z + x)/y'{x). 

(The arguments are the same as in Theorem 1.) On the other hand, with C := 
c„+i, the cost up to {z + a;)^ is 



CVn + l(z + x). 
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Together this yields the differential equation 



c\/nz + y{x) + 2{z + x)/y'{x) = C\/n + l{z + x) 

with y(0) = 0. One solution is given by y{x) = c^/nx and C as claimed. 

Once we have derived this solution heuristically, we can verify it exactly: 
Using the backup function y{x) = c^/nx means to make the k-ih backup 
after z“^ at time {z + Let the next fault appear at time {z + u)^, with 

(z H — < (z + < iz + Then we have 

'' Cy/n ^ '' ' — '' c^n ' 



c = 



C^Z + /c + (z + u)2 - (z + 



\/n +l{z + u) 

Considering the derivative ^ we find that C{u) ean attain its maximum only 
at the endpoints of the interval. In case u = we get 

c^n + 2 



C = c 



< 



n+1 c^/in? + 



n 



So it suffices to consider u = Obvious algebraic manipulation yields, in a 
few steps 



C 



c^n + 2 + kc^/n/z + {2k + l)/{c\/nz) 



C'J'n? + n + (A: + l)^Jn +l/z 
In the numerator, replace k with A: + 1 and 2A: + 1 with 2k + 2. Then we see 

c< " 



cVri^ 



n 



also in this case. □ 



Note that the excess rate at any time after the n-th fault is smaller than 
y^^CnVf - For n ^ 00 we get: 

Theorem 2. There exists a deterministic backup strategy with excess rate Cn\/J 
after n faults, such that lim c„ = 2 . 

Proof. Consider the sequence c„ given by Lemma 1. With s„ := we get 

_ (s„n + 2)2 

/ 2 I \ ■ 

finin'^ + n) 



Further let be s„ = 4 + r„/n. By easy manipulation we obtain 



Tn+l 



= Tr, 



4 

4n + ■ 



Thus = O(lnn), lims„ = 4, and limc„ = 2, independently of the start value. 

□ 
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4 A Randomized Backup Strategy 

As e.g. in the rent-to-buy problem [4], suitable randomization significantly im- 
proves the expected cost against an oblivious adversary (who has no insight into 
the online player’s random decisions). 

Theorem 3. There exists a randomized 1- fault strategy with expeeted excess rate 
C\fJ such that lim/^o c = 2. 

Proof. (Sketch) We modify the deterministic strategy of Theorem 1 which had 
coefficient c = 2^/2. The x-th backup is made at time {x + r)'^ (2, where r is 
a fixed number, randomly chosen from interval [0, 1]. This randomized strategy 
makes the same number of backups as our deterministic strategy did, but it is 
quite clear that the expected loss of working time is about half the worst-case 
loss incurred by S (subject to some failure vanishing with /, i.e. with growing 
backup number). Furthermore remember that, in Theorem 1, both the backups 
and the worst-case loss of time contributed the same amount a = \/2 to c. We 
conclude that our randomized strategy is only 3/4 times as expensive as S', which 
gives lim c = 3/ \f2. 

We achieve the slightly better factor 2 if we make the x-th backup at time 
{x + r)^ instead! Namely, this reduces the backup cost and the expected loss by 
the same factor \/2. □ 



For this type of randomized backup strategy (a fixed backup pattern ran- 
domly shifted on the quadratic scale), the above result is optimal, by a similar 
argument as in Theorem 1. It remains open whether it is optimal at all. We hope 
that a suitable application of Yao’s minimax principle will provide an answer. 
For n faults we have: 



Theorem 4. There exists a randomized backup strategy with expected excess 
rate Cn\^ after n faults, such that limj^o liiUn^oo Cn = v^. 

Proof (Sketch) The method of Lemma 1 of extending an n-fault strategy to an 
(n + l)-fault strategy is also applicable in the randomized case, i.e. if c is the 
expected coefficient: If we apply a scheme as in the weaker 3 / \/2 version of The- 
orem 3, the number of backups y{x) is deterministic (subject to ±1 deviations), 
and 2{z + x)/y'{x) is replaced with the expected loss, i.e. multiplied by 1/2. We 
therefore use the modified equation 

c\^z + y(x) + {z -\- x)/y'(x) = C\/n + l{z + x) 



to obtain C which is the expected Cn+i- One solution is given by y{x) = c^/nx, 
for 



C = 



c?n + 2 
cVrPd^ 



Let Sn = c^. We get lim = 2 in a similar way as in Theorem 2. The straight- 
forward calculations are omitted. □ 
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5 Some Lower Bounds 



The quite trivial Proposition 1 remains true for randomized strategies and an 
adaptive adversary. A stronger lower bound can be shown if the adversary has 
no obligation to meet some prescribed /. 

Proposition 3. No backup strategy can guarantee an excess rate below 2^/J— f 
against an adaptive adversary. 



Proof. The adversary partitions the time axis into phases of some fixed length 
t > 2 and behaves as follows. If the online player did not make any backup in 
a phase then the adversary injects a fault at the end of this phase. Let x be 
the fraction of phases without backup, thus ending with a fault. Here the online 
player pays 1 per time for repeated work. In the remaining 1 — x fraction of 
phases he pays 1/t or more per time for backups. Hence the average cost per 
time is at least x+{l—x)/t. Furthermore note that / = x/t. Thus the coefficient 
of V7 is 



c= {x + 



1 

t 





\fx 



Vi’ 



The online player can minimize c choosing any strategy with x = l/{t—l) which 
yields 

Vi ViV^ _ 1 

Vt- I Vi - 1) 



and also means f = l/t{t — 1). Now the assertion follows easily. 

Note that the coefficient can be made arbitrarily close to 2 with large enough 
t. □ 



This lower bound does not contradict Theorem 4 which refers to an oblivious 
adversary who must fix the fault times beforehand, whereas in Proposition 3, the 
adversary can permanently decide whether to inject a fault or not, depending on 
the online players behaviour. Thus he can also gain some information about the 
coin tosses in a randomized online strategy. (Of course, the oblivious adversary 
better reflects the real-world situation.) 

In the deterministic case the adversaries all have the same power, hence it 
follows: 

Corollary 1. No deterministic backup strategy can guarantee an excess rate 
better than 2VJ — /• O 

In view of Theorem 2 this is a matching asymptotic lower bound. 

For deterministic strategies, a stronger lower bound than in Proposition 1 
can be proven also for prescribed /. We state one such result: 

Proposition 4. For any fault freguency f , even if the online player knows f 
beforehand, no deterministic backup strategy can guarantee an excess rate below 

V2Vf- 
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Proof. An adversary partitions the time axis into phases of length 1 // and places 
one fault in each phase by the following rule: Since the online player’s strategy 
is deterministic, the adversary knows the sequence of backups made in the next 
phase until a fault. W.l.o.g. the phase starts at time 0, and the backup times 
are Let to = 0, and in case tk < 1// further define tk+i = !//• 

The adversary injects a fault immediately before ti+i such that i + — ti is 

maximized. 

The best an online player can do against this adversary’s strategy is to choose 
his ti so as to minimize maxi{i + ti+i — ti). Obviously all these terms should 
be equal, thus ti = it\ — i{i — l)/2. In particular this yields 1// « kti — k? j2. 
Since the adversary may place his fault at the end of the phase, both ti and k 
are lower bounds for the additional cost the online player incurs in this phase. 
By elementary calculation, maxjti, A:} is minimized if t\ = k = -\/2/f. □ 
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Abstract. The investigation of the possibility to efficiently compute 
approximations of hard optimization problems is one of the central and 
most fruitful areas of current algorithm and complexity theory. The aim 
of this paper is twofold. First, we introduce the notion of stability of 
approximation algorithms. This notion is shown to be of practical as well 
as of theoretical importance, especially for the real understanding of the 
applicability of approximation algorithms and for the determination of 
the border between easy instances and hard instances of optimization 
problems that do not admit any polynomial-time approximation. 
Secondly, we apply our concept to the study of the traveling salesman 
problem. We show how to modify the Christofides algorithm for A-TSP 
to obtain efficient approximation algorithms with constant approxima- 
tion ratio for every instance of TSP that violates the triangle inequality 
by a multiplicative constant factor. This improves the result of Andreae 
and Bandelt [AB95]. 



Keywords: Stability of approximation. Traveling Salesman Problem 

1 Introduction 

Immediately after introducing NP-hardness (completeness) [Co71] as a concept 
for proving intractability of computing problems, the following question has been 
posed: If an optimization problem does not admit an efficiently computable op- 
timal solution, is there a possibility to efficiently compute at least an approxima- 
tion of the optimal solution? Several researchers [Jo74], [Lo75], [Chr76], [IK75] 
provided already in the middle of the seventies a positive answer for some op- 
timization problems. It is a fascinating effect if one can jump from exponential 
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complexity (a huge inevitable amount of physical work) to polynomial complex- 
ity (tractable amount of physical work) due to a small change in the requirements 
— instead of an exact optimal solution one demands a solution whose cost differs 
from the cost of an optimal solution by at most e% of the cost of an optimal 
solution for some e > 0. This effect is very strong, especially, if one considers 
problems for which this approximation concept works for any relative differ- 
ence e (see the concept of approximation schemes in [IK75], [MPS98], [Pa94], 
[BC93]). This is also the reason why currently optimization problems are con- 
sidered to be tractable if there exist randomized polynomial-time approximation 
algorithms that solve them with a reasonable approximation ratio. In what fol- 
lows an o-approximation algorithm for a minimization [maximization] problem 
is any algorithm that provides feasible solutions whose cost divided by the cost 
of optimal solutions is at most a [is at least ^]. 

There is also another possibility to jump from NP to P. Namely, to consider 
the subset of inputs with a special, nice property instead of the whole set of inputs 
for which the problem is well-defined. A nice example is the Traveling Salesman 
Problem (TSP). TSP is not only NP-hard, but also the search for an approximate 
solution for TSP is NP-hard for every constant approximation ratio. ^ But if one 
considers TSP for inputs satisfying the triangle inequality (so called zl-TSP), 
one can even design a polynomial-time |-approximation algorithm [Chr76].^ 
The situation is even more interesting if one considers the Euclidean TSP, where 
the distances between the nodes correspond to the distances in the Euclidean 
metrics. The Euclidean TSP is NP-hard [Pa77], but for every a > 1 one can 
design a polynomial-time o-approximation algorithm [Ar98], [Mi96j. Moreover, 
if one allows randomization the resulting approximation algorithm works in n • 
(log 2 time [Ar97].^ This is the reason why we propose again to revise the 

notion of tractability especially because of the standard definition of complexity 
as the worst-case complexity: Our aim is to try to separate the easy instances 
from the hard instances of every computing problem considered to be intractable. 
In fact, by our concept, we want to attack the definition of complexity as the 
worst-case complexity. The approximation ratio of an algorithm is also defined 
in a worst-case manner. Our idea is to split the set of input instances of the 
given problem into possibly infinitely many subclasses according to the hardness 
of their approximability, and to have an efficient algorithm for deciding the 
membership of any problem instance to one of the subclasses considered. To 
achieve this goal we introduce the concept of approximation stability. 

Informally, one can describe the idea of our concept by the following scenario. 
One has an optimization problem for two sets of inputs Li and L 2 , Li C L 2 - Eor 



^ Even no /(n)-approximation algorithm exists for / polynomial in the input size n. 

^ Note that Z\-TSP is APX-hard and we know even explicit lower bounds on its inap- 
proximability [En99, BHKSUOOj. 

® Obviously, there are many similar examples where with restricting the set of inputs 
one crosses the border between decidability and undecidability (Post Correspon- 
dence Problem) or the border between P and NP (SAT and 2-SAT, or vertex cover 
problem). 
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Li there exists a polynomial-time ^-approximation algorithm A for some d > 1, 
but for L 2 there is no polynomial-time 7-approximation algorithm for any 7 > 1 
(if NP is not equal to P). We pose the following question: Is the use of algorithm 
A really restricted to inputs from Li? Let us consider a distance measure d in 
L 2 determining the distance d{x) between Li and any given input x e L 2 — Li. 
Now, one can consider an input x G L 2 — Li with d{x) ^ k for some positive real 
k. One can look for how “good” the algorithm A is for the input a; G L2 ~ ^i- If 
for every k > 0 and every x with d{x) ^ k, A computes a 7A,7-approximation of 
an optimal solution for x {'jk,s is considered to be a constant depending on k and 
S only), then one can say that A is “(approximation) stable” according to the 
distance measure d. Obviously, such a concept enables to show positive results 
extending the applicability of known approximation algorithms. On the other 
hand it can help to show the boundaries of the use of approximation algorithms 
and possibly even a new kind of hardness of optimization problems. 

Observe that the idea of the concept of approximation stability is similar to 
that of stability of numerical algorithms. Instead of observing the size of the 
change of the output value according to a small change of the input value, one 
looks for the size of the change of the approximation ratio according to a small 
change in the specification of the set of consistent input instances. 

To demonstrate the applicability of our new approach we consider TSP, A- 
TSP, and, for every real /? > 1, zl^-TSP containing all input instances with 
cost{u, v) ^ f3 ■ {cost{u, x) + cost{x, v)) for all vertices u, v, x. If an input is 
consistent for Zi^-TSP we say that its distance to zl-TSP is at most [3—1. 
We will show that known approximation algorithms for Z\-TSP are unstable 
according to this distance measure. But we will find a way how to modify the 
Christofides algorithm in order to obtain approximation algorithms for Z\-TSP 
that are stable according to this distance measure. So, this effort results in a (| • 
/3^)-approximation algorithm for Zi^-TSP.^ This improves the result of Andreae 
and Bandelt [AB95] who presented a (|/3^ + |/?)-approximation algorithm for 
zl^-TSP. Our approach essentially differs from that of [AB95] , because in order to 
design our (|•/?^)-approximation algorithm we modify the Christofides algorithm 
while Andreae and Bandelt obtain their approximation ratio by modifying the 
original 2-approximation algorithm for Z\-TSP. 

Note that, after this paper was written, we got the information about the 
independent, unpublished result of Bender and Chekuri, accepted for WADS’99 
[BC99]. They designed a 4/?-approximation algorithm which can be seen as a 
modification of the 2-approximation algorithm for zi^-TSP. Despite this nice 
result, there are three reasons to consider our algorithm. First, our algorithm 
provides a better approximation ratio for /? < |. Secondly, in the previous work 
[AB95], the authors claim that the Christofides algorithm cannot be modified 

^ Note that in this way we obtain an approximate solution to every problem instance 
of TSP, where the approximation ratio depends on the distance of this problem 
instance to A-TSP. Following the discussion in [Ar98] about typical properties of 
real problem instances of TSP our approximation algorithm working in O(n^) time 
is of practical relevance. 
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in order to get a stable (in our terminology) algorithm for TSP, and our result 
disproves this conjecture. This is especially of practical importance, since for in- 
stances where the triangle inequality is violated only by a few edge costs, one can 
expect that the approximation ratio will be as in the underlying algorithm with 
a high probability. Finally, our algorithm is a practical 0(n^)-algorithm. This 
cannot be said about the 4/?-approximation algorithm from [BC99]. The first 
part of the latter algorithm is a 2-approximation algorithm for finding minimal 
two-connected subgraphs with time complexity O(n^). For the second part, con- 
structing a Hamiltonian tour in S'^ (if S was the two-connected subgraph) , there 
exist only proofs saying that it can be implemented in polynomial time, but no 
low-degree polynomial upper bound on the time complexity of these procedures 
has been established. 

This paper is organized as follows: In Section 2 we introduce our concept of 
approximation stability. In Section 3 we show how to apply our concept in the 
study of the TSP, and in Section 4 we discuss the potential applicability and 
usefulness of our concept. 

2 Definition of the Stability of Approximation Algorithms 

We assume that the reader is familiar with the basic concepts and notions of algo- 
rithmics and complexity theory as presented in standard textbooks like [BC93], 
[CLR90], [GJ79], [Ho96], [Pa94]. Next, we give a new definition of the notion of 
an optimization problem. The reason to do this is to obtain the possibility to 
study the influence of the input sets on the hardness of the problem considered. 
Let IN = {0, 1,2,.. .} be the set of nonnegative integers, let IR^ be the set of 
positive reals, and let IR^“ be the set of all reals greater than or equal to a for 
some a e IR. 

Definition 1. An optimization problem U is a 7-tuple U = (N/, Uq, L, Li, 
A4, cost, goal), where 

1. Ui is an alphabet called input alphabet, 

2. So is an alphabet called output alphabet, 

3. L C Sj is a language over Si called the language of consistent inputs, 
4- Li Q L is a language over Si called the language of actual inputs, 

5. M. is a function from L to 2^o, where, for every x E L, A4(x) is called the 
set of feasible solutions for the input x, 

6. cost is a function, called cost function, from U®eL-^(^) ^ 1R^°, 

7. goal G {minimum, maximum}. 

For every x E L, we define 

Output[j{x) = {y E A4(x) \ cost{y,x) = goal{cost{z,x)\z E M{x)}} 



and 



Optfj{x) = cost{y,x) for some y E Outputu{x). 
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Clearly, the meaning for Uj, cost and goal is the usual one. L 

may be considered as the set of consistent inputs, i.e., the inputs for which the 
optimization problem is consistently defined. L/ is the set of inputs considered 
and only these inputs are taken into account when one determines the complexity 
of the optimization problem U . This kind of definition is useful for considering the 
complexity of optimization problems parameterized according to their languages 
of actual inputs. In what follows, Language{U) denotes the language Lj of 
actual inputs of U. If the input x is fixed, we usually use cost{y) instead of 
cost{y, x) in what follows. 

Definition 2. Let U = [Si, So, L, Lj, M., cost, goal) be an optimization 
problem. We say that an algorithm A is a consistent algorithm for U if, for 
every input x G Lj, A computes an output A{x) G M{x). We say that A solves 
U if, for every x G Li, A computes an output A{x) from Outputjj{x) . The time 
complexity of A is defined as the function 



TimeA{n) = ma,x{TimeA{x)\x G L/ Gl Sf} 



from IN to IN, where TimeA{x) is the length of the computation of A on x. 



Next, we give the definitions of standard notions in the area of approximation 
algorithms (see e.g. [CK98], [Ho96]). 



Definition 3. Let U = {Si,So,L,Li,M., cost, goal) be an optimization prob- 
lem, and let A be a consistent algorithm for U . For every x e Lj, the approxi- 
mation ratio Ra{x) is defined as 



Ra{x) = max 



cost{A{x)) Optu{x) 1 
Optu{x) ’ cost{A{x)) / 



For any n G IN, we define the approximation ratio of A as 



R-Ain) = max{i?^(a;)|a; G L/ n Sf}. 

For any positive real 6 > 1, we say that A is a S -approximation algorithm 
for U if Ra{x) ^ 3 for every x e Lj. 

For every function / : IN ^ IR^^, we say that A is an f {n) -approximation 
algorithm for U if RA{n) ^ /(n) for every n G IN. 

In what follows, we consider the standard definitions of the classes NPO, 
PO, APX (see e.g. [Ho96],[MPS98]). In order to define the notion of stability of 
approximation algorithms we need to consider something like a distance between 
a language L and a word outside L. 

Definition 4. Let U = [Si, So, L, Lj, M., cost, goal) and U = [Si, So, 
L, L, A4, cost, goal) be two optimization problems with Lj C L. A distance 
function for U according to Lj is any function Ll ■ L ^ IR^*^ satisfying 
the properties 
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1. hL{x) = 0 for every x G Lj, and 

2. hi can he computed in polynomial time. 

Let h be a distance function for U according to Lj. We define, for any r G 
Ballr,h{Li) = {w e L \ h{w) ^ r}. 

Let A be a consistent algorithm for U, and let A be an s- approximation algorithm 
for U for some e G Let p be a positive real. We say that A is p-stable 

according to h if, for every real 0 < r ^p, there exists a 5r,e G such that 
A is a 5r, e-approximation algorithm for Ur = {Ui, So, L, Ballr,h{Li), M., cost, 
goal).^ 

A is stable according to h if A is p-stable according to h for every p G 
We say that A is unstable according to h if A is not p-stable for any p G 
For every positive integer r, and every function /,, : IN ^ we say that 
A is {r, fr{n))- quasistable according to h if A is an fr{n) -approximation 
algorithm for Ur = {Sj, Sq, L, Ballr,h{Li), M., cost, goal). 

A discussion about the potential usefulness of our concept is given in the last 
section. In the next section we show a transparent application of our concept for 
TSP. 



3 Stability of Approximation Algorithms and TSP 

We consider the well-known TSP problem (see e.g. [LLRS85]) that is in its gen- 
eral form very hard for approximation. But if one considers complete graphs 
in which the triangle inequality holds, then we have a |-approximation algo- 
rithm due to Christofides [Chr76]. So, this is a suitable starting point for the 
application of our approach based on approximation stability. First, we define 
two natural distance measures and show that the Christofides algorithm is sta- 
ble according to one of them, but not according to the second one. This leads 
to the development of a new algorithm, PMCA, for Z\,g-TSP. This algorithm 
is achieved by modifying Christofides algorithm in such a way that the result- 
ing algorithm is stable according to the second distance measure, too. In this 
way, we obtain a (| • (1 + r)^)-approximation algorithm for every input in- 
stance of TSP with the distance at most r from Language{A-T'SP) , i.e. with 
cost{u, u) ^ (1 + r) • {cost{u, w) + cost{w, v)) for every three nodes u, v, w. This 
improves the result of Andreae and Bandelt [AB95] who achieved approximation 
ratio |(1 + r)^ + |(1 + r). 

To start our investigation, we concisely review two well-known algorithms 
for A-TSP: the 2-approximative algorithm 2APPR and the |-approximative 
Christofides algorithm [Chr76], [Ho96]. 



Note that Sr,e is a constant depending on r and e only. 



5 
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Algorithm 2APPR 



Input: A complete graph G = (V, E) with a cost function cost : E 1R^° 
satisfying the triangle inequality (for every u,v,q e V, cost{u, v) ^ cost{u, q) + 
cost{q, v)). 

Step la: Construct a minimal spanning tree T of G. (The cost of T is surely 
smaller than the cost of the optimal Hamiltonian tour.) 

Step lb: Construct an Eulerian tour D on T going twice via every edge of T. (The 
cost of D is exactly twice the cost of T.) 

Step 2: Construct a Hamiltonian tour H from D by avoiding the repetition of 
nodes in the Eulerian tour. (In fact, H is the permutation of nodes of G, where 
the order of a node v is given by the first occurrence of v in D.) 

Output: H. 



Christofides Algorithm 

luput: A complete graph G = {V, E) with a cost function cost : E 1R^° 
satisfying the triangle inequality. 

Step la: Construct a minimal spanning tree T of G and find a matching M with 
minimal cost (at most | of the cost of the optimal Hamiltonian tour) on the 
nodes of T with odd degree. 

Step lb: Construct a Eulerian tour H on G' = T U M . 

Step 2: Construct a Hamiltonian tour H from D by avoiding the repetition of 
nodes in the Eulerian tour. 

Output: H. 



Since the triangle inequality holds and Step 2 in both algorithms is realized by re- 
peatedly shortening a path x,ui,. . . , Um, V by the edge (x, y) (because mi, . . . , Um 
have already occurred before in the prefix of D) the cost of H is at most the 
cost of D. Thus, the crucial point for the success of 2APPR and Christofides 
algorithm is the triangle inequality. A reasonable possibility to search for an ex- 
tension of the application of these algorithms is to look for inputs that “almost” 
satisfy the triangle inequality. In what follows we do this in two different ways. 

Let Z\-TSP = (27/, Eq, L, Li, A4, cost, minimum) be a representation of the 
TSP with triangle inequality. We may assume 27/ = 27 q = {0, 1, #}, L contains 
codes of all cost functions for edges of complete graphs, and L/ contains codes 
of cost functions that satisfy the triangle inequality. Let, for every x e L, Gx = 
(14, Ex, costx) be the complete weighted graph coded by x. Obviously, the above 
algorithms are consistent for (27/, 27o, L, L, A4, cost, minimum). 

Let Z\i+r^(i-TSP = {Ei,Eo,L, Ballr,d{Li), M., cost, minimum) for any r G 
IR^ and for any distance function d for Z\-TSP. We define for every x & L, 
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dist{x) = max 




cost{u, v) 

cost{u,p) + cost{p,v) 



- 1 



u,v,p e 14 




and 



distance{x) = max < 0,max 



cost{u, v) 



- 1 



u,v e Vx, 



T.T=i cost{pi,pi+i) 
and u = pi,p 2 , ■ ■ ■ ,Pm+i = n is a simple path between 



u and V in G 



}}■ 



Since the distance measure dist is the most important for us we will use the 
notation Z\^-TSP instead of Z\^_dist-TSP. For simplicity we consider the size of 
X as the number of nodes of Gx instead of |a;|. We observe that for every /? 4 1 the 
inputs from Z\^_disi-TSP have the property cost{u, v) 4 fi-{cost{u, x)+cost{x, v)) 
for all u,v,x (/? = 1 + r). It is a simple exercise to prove the following lemma. 



Lemma 1. The 2APPR and Christofides algorithm are stable according to dis- 
tance. □ 



Now, one can ask for the approximation stability according to the distance 
measure dist that is the most interesting distance measure for us. Unfortunately, 
as shown in the next lemmas, the answer is not as positive as for distance. 

Lemma 2. For every r <E IR^, Christofides algorithm is (r, | • (1 + _ 

quasistable for dist, and 2APPR is (r, 2-{l + r) ) -quasistable for dist. □ 

That the result of Lemma 2 cannot be essentially improved, is shown by 
presenting an input for which the Christofides algorithm as well as 2APPR 
provide a very poor approximation. 

Lemma 3. For every r G IR^, if the Christofides algorithm (or 2APPR) is 
(r, fr{n))- quasistable for dist, then fr{n) 4 n^°® 2 p +’')/(2 • (1 + r)). 

Proof. We construct a weighted complete graph from Ballr,dist{Ti) as follows. 
We start with the path po,pi, . . . ,pn for n = 2^, /c G IN, where every edge 
(Pi,Pi+i) has the cost 1. For all other edges we take maximal possible costs in 
such a way that the constructed input is in Ballr,dist{Ti). As a consequence, 
for every m G {!,..., log 2 n}, we have co.st{pi,pi^ 2 ^) = 2™ • (1 + * = 

0, . . . , n — 2™ (see Figure 1). 

Let us have a look at the work of Christofides algorithm on this input. (Simi- 
lar considerations can be made for 2APPR.) There is only one minimal spanning 
tree that corresponds to the path containing all edges of the cost 1 . Since every 
path contains exactly two nodes of odd degree, the Eulerian graph constructed 
in Step 1 is the cycle D = po,pi,p 2 , . . . ,Pn,Po with the n edges of cost 1 and the 
edge of the maximal cost n • (1 + = 77 ,i+i°g 2 (!+’■), Since the Eulerian path 

is a Hamiltonian tour, the output of the Christofides algorithm is unambiguously 
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4{l+rf 4(l+r)^ 4(l+r)^ 




Fig. 1. A hard Z\^_dist-TSP instance 



the cycle po,Pi, ■ ■ ■ ,Pn,Po with the cost n + n- + But the optimal tour 

is 

T = P0,P2,P4, ■ ■ -,P2i,P2(i+l), ■ ■ ■,Pn,Pn-l,Pn-3, ■ ■ ■ , P2i+1, P2i-1, ■ ■ .,P3,Pl,P0- 

This tour contains two edges {po,Pi) and {pn-i,Pn) of the cost 1 and all n — 2 
edges of the cost 2 • (1 + r). Thus, Opt = cost(T) = 2 + 2 • (1 + r) • (n — 2), and 

COSt{D) _ n + nl+*°S2(l+’^) ^ „l+logAl+r) ^ „log2(l+r) 

cost(T) ~ 2 + 2 • (1 + r) • (n - 2) 2-n-(l + r) “ 2 • (1 + r) ’ 



□ 



Corollary 1. 2APPR and the Christofides algorithm are unstable for dist. 

The results above show that 2APPR and Christofides algorithm can be useful 
for a much larger set of inputs than the original input set. But the stability 
according to dist would provide approximation algorithms for a substantially 
larger class of input instances. So the key question is whether one can modify 
the above algorithms to get algorithms that are stable according to dist. In what 
follows, we give a positive answer on this question. 

Theorem 1. For every (3 G IR^^, there is a (| • fi‘^)-approximation algorithm 
PMCA for Afs^dist-TSP working in time 0{n^). 

Proof sketch. . In the following, we will give a sketch of the proof of Theorem 1 
by stating algorithm PMCA. The central ideas of PMCA are the following. First, 
we replace the minimum matching generated in the Christofides Algorithm by 
a “minimum path matching” . That means to find a pairing of the given vertices 
s.t. the vertices in a pair are connected by a path rather than a single edge, and 
the goal is to minimize the sum of the path costs. In this way, we obtain an 
Eulerian tour on the multi-graph consisting of spanning tree and path matching 
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(in general not Hamiltonian). This Eulerian tour has a cost of at most 1.5 times 
of the cost of an optimal TSP tour. 

The second new concept concerns the substitution of sequences of edges by 
single ones, when transforming the above mentioned tour to a Hamiltonian one. 
Here, we can guarantee that at most four consecutive edges will be eventually 
substituted by a single one. This may increase the cost of the tour by a factor 
of at most for inputs from Z\,g-TSP (remember that we deal in what follows 
only with distance function dist, and therefore drop the corresponding subscript 
from Z\/ 3 _dist-TSP). 

Before stating the algorithm in detail, we have to introduce its main tools 
first. Let G = {V,E) be a graph. A path matching for a set of vertices U CV 
of even size is a collection U of \U\/2 edge-disjoint paths having U as the set of 
endpoints. 

Assume that p = (uq, ui), (ui, U 2 ), . . . , (ufe_i, is a path in {V,E), not 
necessarily simple. A bypass for p is an edge (u, v) from E, replacing a sub-path 
(ui,Ui+i), (ui+i,Ui+2), ■ ■ ■ , {uj-i,Uj) of p from u = Ui to Uj = v {0 ^ i < j ^ k). 
Its size is the number of replaced edges, i.e. j — i.® Also, we say that the vertices 
Wi+i, Ui+ 2 , • • • , Wj_i are bypassed. Given some set of simple paths U, a conflict 
according to iT is a vertex which occurs at least two times in the given set of 
paths. 

Algorithm PMCA 

Input: a complete graph (V,E) with cost function cost : E IR^® 

(a Ap-TSP instance for fl > 1). 

1. Construct a minimal spanning tree T of {V,E). 

2. Let U be the set of vertices of odd degree in T; 
construct a minimal (edge-disjoint) path matching 77 for U. 

3. Resolve conflicts according to 77, in order to 

obtain a vertex-disjoint path matching 77' with cosflU') ^ fl ■ cost{II) 
(using bypasses of size 2 only). 

4. Construct an Eulerian tour tt on T and 77'. 

(tt can be considered as a sequence of paths pi,P 2 ,P 3 , ■ ■ ■ 
such that pi,P 3 , ... are paths in T, and P 2 ,P 4 , ■ ■ ■ G 77') 

5. Resolve conflicts inside the paths pi,ps, . . . from T, such that T is divided into 

a forest Tf of trees of degree at most 3, using bypasses of size 2 only. 

(Call the resulting paths PiPP^, ■ ■ ■ and the modified tour tt' is Pi,P 2 ,P 3 ,P 4 ■ ■ •■) 

6. Resolve every double occurrence of nodes in tt' such that the overall size 

of the bypasses is at most 4 (where “overall” means that a bypass constructed 
in Step 3 or 5 counts for two edges). Obtain tour tt". 

Output: Tour tt". 

In the following, we have to explain how to efficiently obtain a minimal path 
matching, and how to realize the conflict resolution in Steps 3, 5, and 6. The 
latter not only have to be efficient but must also result in substituting at most 
four edges by a single one after all. 



Obviously, we are not interested in bypasses of size 1. 
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How to construct an Eulerian cycle in Step 4 is a well-studied task. We only 
observe that since each vertex can be endpoint of at most one path from II' by 
definition the same holds for T: the endpoints of pi,p 3 , . . . are the same as those 
of P2,P4,.... 

We give in the following detailed descriptions of Steps 2, 3, 5, and 6, respec- 
tively. 

Claim 1 

One can construct in time 0(|Hp) a minimum path matching II for U that has 
the following properties: 

Every two paths in II are edge-disjoint. (1) 

n forms a forest. (2) 

Proof sketch.. First, we will show how to construct a path matching within the 
given time. To construct the path matching, we first compute all-pairs cheapest 
paths. ^ Then, we define G' = (V, E') where cost'{v, v') is the cost of a cheapest 
path between v and v' in G. Next, we compute a minimum matching on G' (in 
the usual sense), and finally, we substitute the edges of G' in the matching by 
the corresponding cheapest paths in G. Clearly, this can be done in time 0{'n?) 
and results in a minimum path matching. 

The claimed properties (1) and (2) are a consequence of the minimality. The 
technical details will be given in the full version of this paper. □ 

The essential property of a minimal path matching for our purposes is that 
it eosts at most half of the cost of a minimal Hamiltonian tour. Now we show 
how Step 3 of the algorithm is performed. 

Claim 2 

Every path matching having properties (1) and (2) can be modified into a vertex- 
disjoint one by using bypasses of size 2 only. Moreover, on each of the new paths, 
there will be at most one bypass. 

Proof sketch.. By Claim 1, every vertex used by two paths in a path matching 
belongs to some tree. We will show how to resolve a tree of II by using bypasses 
of size 2 in such a way that only vertices of the tree are affected. Then we are 
done by solving all trees independently. 

Let IIt be a subset of II, forming a tree. For simplicity, we address IIt itself 
as a tree. Every vertex of IIt being a conflict has at least three edges incident 
to it, since it cannot be endpoint of two paths in II, and it is part of at least 
two edge-disjoint paths by definition of a conflict. 

We reduce the size of the problem at hand in that we eliminate paths from 
the tree by resolving conflicts. 



^ Since we associate a cost instead of a length to the edges, we speak about cheapest 
instead of shortest path. 
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Procedure 1 

Input: A minimal path matching 77 for some vertex set U on {V,E). 

For all trees Ut of 77 

While there are conflicts in Ut (i e. there is more than one path in Ut) 
pick an arbitrary path p G Ut', 

if p has only one conflict v, and v is an endpoint of p, 
pick another path using v as new p instead; 
let Vi,V2, ■ ■ ■ ,Vk be (in this order) the conflicts in p\ 
while k > 1 

consider two paths pi,Pk <= IIt which use vi respectively Vk, commonly with p\ 
pick as new p one of pi,pk which was formerly not picked; 
let Vi,V2, ■ ■ ■ ,Vk be (in this order) the conflicts in p\ 
let V be the only vertex of the finally chosen path p which is a conflict; 
if V has two incident edges in p, 
replace those with a bypass, 
else [v is an endpoint of p) 

replace the single edge incident to p in p together 
with one of the previously picked paths with a bypass. 

Output: the modified conflict-free path matching 77'. 

The proof of the correctness of Procedure 1 is moved to the full version of this 
paper. □ 

Now we describe the implementation of Step 5 of Algorithm PMCA. It divides 
the minimal spanning tree by resolving conflicts into several trees, whose crucial 
property is that they have vertices of degree at most 3. 

Procedure 2 below is based on the following idea. First, a root of T is picked. 
Then, we consider a path pi in T which, under the orientation w.r.t. this root, 
will go up and down. The two edges immediately before and after the turning 
point are bypassed. One possible view on this procedure is that the minimal 
spanning tree is divided into several trees, since each bypass building divides a 
tree into two trees. 

Procedure 2 

Input: T and the paths pi,P3,P5, ■ ■ ■ computed in Step 4 of Algorithm PMCA. 
Choose a node r as a root in T. 

For each path 

Pi = {vi,V 2 ), {V2,V3), ..., {Vni-l,Vni) in T do 

Let Vj be the node in pi of minimal distance to r in T. 

\f 1 < j < Tii then 

bypass the node Vj and call this new path p^. 
else Pi = Pi- 

Output: The paths p\,p'^,p'^, . . building a forest Ty. 

Now the following properties hold. Their proofs are given in the full version 
of this paper. 
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1. If a node v occurs in two different paths p' and p'- of Tf, then v is an inner 

node in one path and a terminal node in the other path. I.e. the node degree 
of the forest spanned by . . . is at most three. 

2. In Tf, every path has at most one bypass, and every bypass is of size two. 

3. Vertices which are leaves in T/ are not conflicts in tt'. 

4. In the cycle p[,p 2 ,P 3 ,P 4 ,P 5 ,Pe, ■ ■ ■, between each two bypasses there is at 
least one vertex not being a conflict. 

Below, we present Procedure 3 which consecutively resolves the remaining 
conflicts. Note that s, t, u, v, and their primed versions, denote occurrences of 
vertices on a path, rather than the vertices itself. In one step, Procedure 3 has 
to make a choice. 

Procedure 3 

Input: a cycle tt' on {V,E) where every vertex of V occurs once or twice. 

Take an arbitrary conflict, i.e. a vertex occurring twice as u and u' in tt'; 
bypass one occurrence, say u (with a bypass of size 2); 
while there are conflicts remaining 

if occurrence u has at least one unresolved conflict as neighbor 
let V be one of them, chosen by the following rule: 

If between u and another bypassed vertex occurrence t on 
tt', there are only unresolved conflicts, choose v to be the 
neighbor of u towards t. 

{{v,u) or {u,v) is an edge of tt' and there is another occurrence v' of 
the same vertex as v) 
resolve that conflict by bypassing u' 
else 

resolve an arbitrary conflict; 
let u be the bypassed vertex. 

Output: the modified cycle tt". 

The proofs of correctness of Procedure 3 and of the approximation ratio of 
PMCA are given in the full version of this paper. □ 

Theorem 1 improves the approximation ratio achieved in [AB95]. Note that 
this cannot be done by modifying the approach of Andreae and Bandelt. The 
crucial point of our improvement is based on the presented modification of 
Christofides algorithm while Andreae and Bandelt conjectured in [AB95] that 
Christofides algorithm cannot be modified in order to get an approximation 
algorithm for Z\/ 3 _dist-TSP. 

Note that Theorem 1 can also be formulated in a general form by substituting 
the parameter /? by a function /J(n), where n is the number of nodes of the graph 
considered. 

4 Conclusion and Discussion 

In the previous sections we have introduced the concept of stability of approxima- 
tions and we have applied it to TSP. Here we discuss the potential applicability 
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and usefulness of this concept. Applying it, one can establish positive results of 
the following types: 

1. An approximation algorithm or a PTAS can be successfully used for a larger 
set of inputs than the set usually considered (see Lemma 1). 

2. We are not able to successfully apply a given approximation algorithm A (a 
PTAS) for additional inputs, but one can modify A to get a new approxi- 
mation algorithm (a new PTAS) working for a larger set of inputs than the 
set of inputs of A (see Theorem 1 and [AB95, BC99]). 

3. To learn that an approximation algorithm is unstable for a distance measure 
could lead to the development of completely new approximation algorithms 
that would be stable according to the considered distance measure. 

The following types of negative results may be achieved: 

4. The fact that an approximation algorithm is unstable according to all “rea- 
sonable” distance measures and so that its use is really restricted to the 
original input set. 

5. Let Q = (2Jj, So, L, Li, M., cost, goal) G NPO be well approximable. If, 
for a distance measure d and a constant r, one proves the nonexistence of 
any approximation algorithm for Qr.d = {Si, Sq, L, Ballr^iLi), M., cost, 
goal) under the assumption P ^ NP, then this means that the problem Q 
is “unstable” according to d. 

Thus, using the notion of stability one can search for a spectrum of the 
hardness of a problem according to the set of input instances, which is the main 
aim of our concept. This has been achieved for TSP now. Collecting results of 
Theorem 1 and of [BC99], we have min{|/?^, 4/?}-approximation algorithms for 
^/ 3 ,dist-TSP, and following [BC99], zi^g^dist-TSP is not approximable within a 
factor 1 + s ■ fd for some e < 1. While TSP does not seem to be tractable from 
the previous point of view of approximation algorithms, using the concept of 
approximation stability, it may look tractable for many specific applications. 
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Abstract. Counting functions can be defined syntactically or seman- 
tically depending on whether they count the number of witnesses in a 
non-deterministic or in a deterministic computation on the input. In 
the Turing machine based model, these two ways of defining counting 
were proven to be equivalent for many important complexity classes. In 
the circuit based model, it was done for #P and ^L, but for low-level 
complexity classes such as #AC° and only the syntactical defi- 

nitions were considered. We give appropriate semantical definitions for 
these two classes and prove them to be equivalent to the syntactical ones. 
This enables us to show that #AC° is included in the family of count- 
ing functions computed by polynomial size and constant width counting 
branching programs, therefore completing a result of Caussinus et al 
[CMTV98]. We also consider semantically defined probabilistic complex- 
ity classes corresponding to AC° and NC^ and prove that in the case of 
unbounded error, they are identical to their syntactical counterparts. 



1 Introduction 

Counting is one of the basic questions considered in complexity theory. It is a 
natural generalization of non-determinism: computing the number of solutions 
for a problem is certainly not easier than just deciding if there is a solution at 
all. Counting has been extensively investigated both in the machine based and 
in the circuit based models of computation. 

Historically, the first counting classes were defined in Turing machine based 
complexity theory. Let us call a non-deterministic Turing machine an NP-machine 
if it works in polynomial time, and an NL-machine if it works in logarithmic 
space. In the case of a non-deterministic machine, an accepting path in its com- 
putation tree on a string x certifies that x is accepted. We will call such a path 
a witness for x. The very first, and still the most famous, counting class called 
was introduced by Valiant [Val79] as the set of counting functions that map 
a string x to the number of witnesses for x of some NP-machine. An analogous 
definition was later made by Alvarez and Jenner [AJ93] for the class it con- 
tains the set of counting functions that map x to the number of witnesses for x of 
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some NL-machine. These classes contain several natural complete problems: for 
example computing the permanent of a matrix is complete in jfP, whereas com- 
puting the number of paths in a directed graph between two specified vertices 
is complete in #L. 

The so-called “Gap” classes were defined subsequently to include functions 
taking also negative values into the above model. GapP was introduced by Fen- 
ner, Fortnow and Kurtz [FFK94] as the difference of two functions in The 
analogous definition for GapL was made independently by Vinay [Vin91], Toda 
[Tod91], Damm [Dam91] and Valiant [Val92]. This later class has received con- 
siderable attention, mostly because it characterizes the complexity of computing 
the determinant of a matrix [A096, ST98, MV97]. 

Still in the Turing machine model, there is an alternative way of defining 
the classes #P and #L, based on the computation of deterministic machines. 
In the following discussion let us consider deterministic Turing machines acting 
on pairs of strings (x,y) where for some polynomial p(n), the length of y is 
p{\x\). We will say that the string y is a witness for x when the machine accepts 
(x,y), otherwise y is a non-witness. We will call a deterministic Turing machine 
a P-machine if it works in polynomial time, and an L-machine if it works in 
logarithmic space and it has only one-way access to y. Then (respectively 
can be defined as the set of functions / for which there exists a P-machine 
(respectively L-machine) such that f{x) is the number of witnesses for x. The 
equivalence between these definitions can be established if we interpret the above 
deterministic Turing machines as a normal form, with simple witness structure, 
for the corresponding non-deterministic machines, where the string y describes 
the sequence of choices made during the computation on x. Nonetheless this 
latter way of looking at counting has at least two advantages over the previous 
one. 

The first advantage is that this definition is more robust in the following 
sense. Two non-deterministic machines, even if they compute the same relation 
R{x), might define different counting functions depending on their syntactical 
properties. On the other hand, if the definition is based on deterministic ma- 
chines, only the relation they compute is playing a role. Indeed, two deterministic 
machines computing the same relation R{x,y) will necessarily define the same 
counting function independently from the syntactical properties of their compu- 
tation. Therefore, from now on, we will refer to the non-deterministic machine 
based definition of counting as syntaetieal, and to the deterministic machine 
based definition as semantieal. 

The second advantage of the semantical definition of counting is that prob- 
abilistic complexity classes can be defined more naturally in that setting. For 
example PP(respectively PL) is just the set of languages for which there exists 
a P-machine (respectively L-machine) such that a string x is in the language ex- 
actly when there are more witnesses for x than non- witnesses. In the case of the 
syntactical definition of counting the corresponding probabilistic classes usually 
are defined via the Gap classes. 
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The above duality in the definition of counting exists of course in other models 
where determinism and non-determinism are meaningful concepts. This is the 
case of the circuit based model of computation. Still, in this model syntactical 
counting has received considerably more attention than semantical counting. 
Before we discuss the reason for that, let us make clear what do we mean here 
by these notions. 

The syntactical notion of a witness for a string x in a circuit family was 
defined by Venkateswaran [Ven92] as an accepting subtree of the corresponding 
circuit on x, which is a smallest sub-circuit certifying that the circuit’s output 
is 1 on X. It is easy to show that the number of such witnesses is equal to the 
value of the arithmetized version of the circuit on x. Let us stress again that 
this number, and therefore the counting function defined by a circuit, depends 
heavily on the specific structure of the circuit and not only on the function 
computed by it. For example if we consider circuit G\ which is just the variable 
X, and circuit G 2 which consists of an OR gate whose both inputs are the same 
variable x, then clearly these two circuits compute the same function. On the 
other hand, on input x = 1, the counting function defined by Gi will take the 
value 1, whereas the counting function defined by the circuit G 2 will take the 
value 2. 

For the semantical notion of a witness we consider again families whose inputs 
are pairs of strings of polynomially related lengths. As in the case of Turing 
machines, y is a witness for x if the corresponding circuit outputs 1 on (x,y). 

Venkateswaran was able to give a characterization of #P and in the 
circuit model based on the syntactical definition of counting. His results rely 
on a circuit based characterization of NP and NL. He has shown that #P is 
equal to the set of counting functions computed by uniform semi-unbounded 
circuits of exponential size and of polynomial algebraic degree; and #L is equal 
to the set of counting functions computed by uniform skew-symmetric circuits 
of polynomial size. Semantically #P can be characterized as the set of counting 
functions computed by uniform polynomial size circuits. 

In recent years several low level counting classes were defined in the cir- 
cuit based model, all in the syntactical setting. Caussinus et al.[CMTV98] have 
defined and Agrawal et al. [AAD97] have defined as the set of 

functions counting the number of accepting subtrees in the respective circuit 
families. In subsequent works, many important properties of these classes were 
established [ABL98, AAB+99]. Although some attempts were made [Yam96], no 
satisfactory characterization of these classes was obtained in the semantical set- 
ting. The main reason for that is that by simply adding “counting” bits to AC*^ 
or NC^ circuits, we fall to the all too powerful counting class #P [SST95, VW96], 
and it is not at all clear what type of restrictions should be made in order to 
obtain and #NC^. 

The main result of this paper is such a semantical characterization of these 
two counting classes. Indeed, we will define semantically the classes ^ACqq and 
#NCco by putting some relatively simple restrictions on the structure of AC° 
and NC^ circuits involved in the definition, and on the way they might contain 
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counting variables. Our main result is that this definition is equivalent to the 
syntactical definition, that is we have 

Theorem 1. #AC° = #AC^q and #NC^ = #NC^o- 

Put it another way, if standard AC*^ and NC^ are seen as “non-deterministic” 
circuit families in the syntactical definition of the corresponding counting classes, 
we are able to characterize their “deterministic” counterparts which define the 
same counting classes semantically. 

We also examine the relationship between #BR, the family of counting func- 
tions computed by polynomial size and constant width counting branching pro- 
grams, and counting circuits. While Caussinus et al. [CMTV98] proved that 
^BR C ^NC^, we will show 

Theorem 2. #AC° C #BR. 

Semantically defined counting classes give rise naturally to the corresponding 
probabilistic classes in the three usually considered cases: in the unbounded, in 
the bounded and in the one sided error model. Indeed, we will define the prob- 
abilistic classes PAC^q- PNC^q. BPAC^o> BPNC^o- PAC^o and RNC^q- 
PAC° and PNC^ were already defined syntactically via #AC° and and 

we will prove for this model that our definitions coincide with previous ones: 

Theorem 3. PAC^q = PAC° and PNCco = PNC\ 

In the other two error models, previous definitions were also semantical, but 
without any restrictions on the way the corresponding circuits could use the 
counting variables. We couldn’t determine if they coincide with ours, and we 
think that this question is worth of further investigations. Nonetheless we argue 
that because of their close relationship with counting branching programs, the 
counting circuit based definition might be the right one. 

The paper is organized as follows: Section 2 contains the definitions for se- 
mantical circuit based counting. Section 3 exhibits the mutual simulations of 
syntactical and semantical counting for the circuit classes AC° and NC^. Theo- 
rem 1 is a direct consequence of Theorems 4 and 5 proven here. In section 4 we 
deal with counting branching programs, and Theorem 2 will follow from The- 
orem 6. Finally in section 5 we discuss the gap and random classes which are 
derived from semantical counting circuits. Theorem 8 relating gap classes and 
counting circuits will imply Theorem 3. 

2 Definitions 

In this chapter we define counting circuit families which will be used for the 
semantical definition of a counting function. Counting circuits have two types of 
input variables: standard and counting ones. They are in fact restricted boolean 
circuits, where the restriction is put on the way the gates and the counting 
variables can be used in the circuits. First we will define the usual boolean 
circuit families and the way they are used to define (syntactically) counting 
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functions, and then we do the same for eounting circuit families. The names 
“eircuit” versus “counting circuit” will be used systematically this way in the 
rest of the paper. 

A bounded fan-in circuit with n input variables is a directed acyclie graph 
with vertices of in-degree 0 or 2. The vertices of in-degree 0 are called inputs, 
and they are labeled with an element of the set {0, l,xi,xi, . . . The 

vertices of in-degree 2 are labeled with the bounded AND or OR gate. There 
is a distinguished vertex of outdegree 0, this is the output of the circuit. An 
unbounded fan-in eireuit is defined similarly with the only difference that non 
input vertices can have arbitrary in-degree, and they are labeled with unbounded 
AND or OR gate. A eireuit family is a sequence of circuits where C„. 

has n input variables. R is uniform if its direct connection language is computed 
in DLOGTIME. An AC*^ circuit family is a uniform, unbounded fan-in circuit 
family of polynomial size and constant depth. An NC^ circuit family is a uniform, 
bounded fan-in circuit family of polynomial size and logarithmic depth. 

A eircuit (7 is a tree eireuit if all its vertices have out-degree 1. A proof tree 
in C on input x is a connected subtree which contains its output, has one edge 
into each OR gate, has all the edges into the AND gates, and which evaluates 
to 1 on X. The number of proof trees in (7 on x will be denoted by #PTa(x). 
A boolean tree circuit family ((7„)^^ computes a function / : {0,1}* ^ N if 
for every x, we have /(x) = #PTcr|,„i (x). We denote by #AC° (respectively by 
#NC^ the class of functions computed by a uniform AC° (respectively NC^) 
tree circuit family. 

In order to introduce counting variables into counting circuits and to carry 
out the syntactical restrictions, we use two new gates, SELECT and PADAND 
gates. These are actually small circuits which will be built some specific way from 
AND and OR gates. The SELECT gates which use a counting variable to choose 
a branch of the circuit will actually replace OR gates which will be prohibited 
in their general form. The PADAND gates will function as AND gates, but they 
will allow again the introduction of counting variables. They will actually fix the 
value of these counting variables to the constant 1. 

We now define formally these gates. In the following we will denote single 
boolean variables with a subscript such as vq. Boolean vector variables will be 
denoted without a subscript, such as v. We will also identify an integer 0 < s < 
2* — 1 with its binary representation (sq, . . . , sa,-i). 

The bounded fan-in SELECT gate will have 3 arguments. It is defined by 
SELECT^ (xo, xi, = x„^, and represented by OR(AND(xo,m 7), AND(xi, m^=)). 
For every k, the unbounded fan-in SELECT* gate has 2^' k arguments and 
is defined by SELECT*(xq, . . . , X 2 fc_i, mq, • • • , wa;-i) = x„. This gate is repre- 
sented by the circuit OR^_g ^(AND(xi,w = i)) where u = i stands for the circuit 
AND*7g(0R(AND(uJ,ij), AND(Mj,ij))). The last gate ean easily be extended 
to m -|- A; arguments for m < 2* as SELECT*(xq, . . . ,Xm_i,Moj • • • j = 

SELECT*(xo, . . . ,Xm,0, . . . ,0,Mo, • • • Clearly, SELECT* can be simu- 

lated by a circuit of depth 0(logA;) eontaining only SELECT* gates. The un- 
bounded fan-in PADAND gate has at least two arguments and is defined by 
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PADAND(xq, mqj • • • j W() = AND(xq,moj ■ ■ ■ ,ui). Its bounded fan-in equivalent 
PADAND*’ can also have an arbitrary number of arguments, and in case of rn 
arguments is represented by a circuit of depth |~logm] consisting of a balanced 
binary tree of bounded AND gates. It will always be clear from the context if 
we are dealing with the bounded or the unbounded PADAND gate. 

We will define recursively unbounded fan-in counting circuits. There will be 
two types of input variables: “standard” and “counting” ones. 

Definition 1 (Counting circuit). 

— If C is a boolean tree cireuit, then C is a eounting circuit. All its variables 
are standard. 

— If Co, . . . , C' 2 fc_i are counting circuits and uq, . . . , Uk-i are input variables 
which are not appearing in them, then SELECT(C'o, . . . , C' 2 fc_i, mq, • • • , Uk-i) 
is a counting circuit. The variables uq, . . . ,Uk-i are counting variables. 

— If Co, . . . ,Cj. are counting circuits and they do not have any common count- 
ing variables, then AND((7o , . . . ,Ck) is a counting circuit. 

— If C is a counting circuit and uo, ... ,ui are input variables, then 
PADAND (C, Mo, •••, w;) Is a counting circuit. The variables uo,...,ui are 
counting variables. 

Moreover, we require that no input variable can be counting and standard at 
the same time. 

Bounded counting circuits are defined analogously, with A; = 1 in all the con- 
struction steps. 

The set of all standard (respectively counting) variables of a circuit C will 
be denoted SV(C') (respectively CV(C'). Let C be a counting circuit with n 
standard variables. The countzng function fJ-CC cj . Rir H- > N associated with 
C is defined as: 



#COa(x) 



r C(x) if CV(C) = 0, 

\ #{m I C{x,u) = 1} if CV(C) 0. 



A sequence (C'„))^^ of counting circuits is a counting family if there exists 
a polynomial p such that for all n, Cn has n standard variables and at most 
p{n) counting variables. A family is uniform if its direct connection language is 
computed in DLOGTIME. The counting function computed by a circuit family 
is defined as ffCOc^„,^{x). Finally, the semantical counting classes are defined as 
follows: #ACco (respectively #NCco) is the set of functions computed by a 
uniform AC° (NC^) family of counting circuits. 



3 Circuits and Counting Circuits 

3.1 Simulating Circuits by Counting Circuits 

We will use a step-by-step simulation. We will define a function 4> which maps 
circuits into counting circuits by structural recursion on the output gate C of 
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the circuit. The definition will be done for the unbounded case from which the 
bounded case can be obtained by replacing the parameter k with 1 in all the 
construction steps, and unbounded gates by bounded ones. 

Definition 2 (the function). If G is a literal, then 4>{G) = G and the 
corresponding variable is standard. If G is an AND gate whose entries are the 
cireuits Gq, . . . ,Gk, then let the circuits G[ be obtained from (j){Gi) by renaming 
counting variables so that Vi ^ j, CV(C'') n CV(C')) = CV(C'') n SV(C')) = 0. 
Then fiG) = AND((7o, . . . ,G'j.). If G is an OR gate whose entries are the eir- 
cuits Go, . . . ,G 2 k_i, then the eircuits G[ are obtained from fiGi) by renaming 
counting variables so that Vi ^ j, CV(C'') n CV(C')) = CV(C'') n SV(C')) = 0. 
Let V = CV(C'') U ... U CV(C'',_J and Vi = V - CV(C''). Let G'f be de- 
fined as PAD AND (O', Vi), and let uq,. . . ,Uk-i be eounting variables sueh that 
{mo, . . . , n V = 0. Then f{G) = SELECT(0^, . . . , O" mq, • • • , «,t-i). 

The next two lemmas will prove that the definition of f is correct and that the 
functions computed by the corresponding circuit families are equal. 

Lemma 1. //((7„) is a uniform AC° (respeetively NC^) family of circuits, then 
{(f}{Cn)) is a uniform AC° (resp. NC^) family of counting circuits. 

Proof. Throughout the construction, we assured that the entry circuits of an 
AND gate do not have common counting variables. Clearly, no input variable 
can be counting and standard at the same time. 

Since there are a polynomial number of gates and for each gate, we introduced 
a polynomial number of counting variables, the number of counting variables is 
bounded by a polynomial. The uniformity of (0(C'„)) follows from the uniformity 
of (Gn). To finish the proof, we should consider the depth of the counting circuits. 

In the unbounded case, {(f){Gn)) is of constant depth since the SELECT gates 
which replace the OR gates of the original circuit are of constant depth. 

In the bounded case, let k be such that there are at most n* variables in Gn. 
The depth of Gn is O(logn). Let us define di = max{depth((^(D))} where D is 
a subcircuit of Gn of depth i. Then we have 



di+i < 3 + max(di, k ■ log n) 



since the depth increases only when the output gate is an OR. Therefore, (0(C'„)) 
is of logarithmic depth. □ 



Lemma 2. For every eircuit G, ffPTc{x) = #CO,^(cr)(a^). 

Proof. We will prove this by structural recursion on the output gate G of G. If 
G is a literal, then by definition, circuits and counting circuits define the same 
counting function. If G is an AND gate then since for i = 0, . . . ,k the vari- 
ables in CV(G') are distinct, #00^(0) (a^) = Y\ffCOc'{xi), which is the same 
as because G' was obtained from (j){Gi) by renaming the vari- 

ables. By the inductive hypothesis and the definition of the proof tree model, 
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this is equal to #PTcr(a^)- If G is an OR gate then since the counting vari- 
ables uq, . . . ,Uk-i are distinct from the counting variables of the subcircuits, 
#CO0(a)(a^) = Y^#COc'.'{x). For every i, #COc.'{x) = #COc.{x) since the 
PADAND gate fixes all the counting variables outside Vi- This is the same value 
as #CO^(^a){x') since C- was obtained from 0(Q) by renaming the variables. 
The statement follows from the inductive hypothesis. □ 

The two lemmas imply 

Theorem 4. #AC° C #AC^o and #NC^ C #NC^o- 

3.2 Simulating Counting Circuits by Circuits 

Let us remark first that any counting circuit C can be easily transformed to 
another counting circuit C computing the same counting function such that if 
PADAND (O', mqj ■■■ ,ui) is a, subcircuit of C', then {uq, . . . ,«;} n CV(O') = 0. 
This is indeed true since if PADAND(O,M0 j ■ ■ ■ ,ni) is a subcircuit of C and if 
for example mq G CV(0) then we can rewrite D with respect to mq = 1 by 
modifying SELECT and PADAND gates accordingly. Por the rest of the paper, 
we will suppose that counting circuits have been transformed this way. 

We will use in the construction circuits computing fixed integers which are 
powers of 2. Eor I > 0 the circuit A 21 computing the integer 2* is defined as 
follows. Ai is the constant 1 and A 2 is OR(l,l). Eor f > 2, in the unbounded 
case, A 21 has a topmost unbounded AND gate with I subcircuits A2. In the 
bounded case, we replace the unbounded AND gate by its standard bounded 
simulation consisting of a balanced binary tree of bounded AND gates. Clearly, 
the depth of A 21 in the bounded case is |~logf] + 1. 

We now define a function ip which maps counting circuits into circuits by 
structural recursion on the output gate G of the counting circuit. Again, the 
definition will be done for the unbounded case, from which the bounded case 
can be obtained by replacing the parameter k with 1. 

Definition 3 (the V' function). If G is a literal, then if{G) = G. If G is an 
AND gate whose entries are Gq, . . . ,Gk, then tf{G) = AND(V’(C'o), . . . ,tf{Gk))- 
If G is a PADAND whose entries are Gq and uq, ■ ■ ■ ,ui, then tp{G) = if (Go). If 
G is a SELECT gate whose entries are Gq, . . . ,G 2 k_i,UQ, . . . ,Uk-i then set V = 
CV(C'o)U...UCV(C'2fc_i) and Vi = V-CV(C'i). We let G' = ANI){ij(Gi), A^w.i) 
andV'(C') = OR(C',...,C'',_J. 

Again, we proceed with two lemmas to prove the correctness of the simulation. 

Lemma 3. If (Gn) is a uniform AC° (respeetively NC^J family of counting cir- 
cuits, then (V’(C'n)) is a uniform AC° (resp. NC^J family of circuits. 

Proof. In the construction, we get rid of PADAND and SELECT gates and of 
the counting variables. We do not modify the number of standard variables. The 
only step where we increase the size or the depth of the circuit is when SELECT 
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gates are replaced. Each replacement introduces at most a polynomial number 
of gates. Therefore the size remains polynomial. Uniformity of (-i/)(C'„)) follows 
from the uniformity of (Cn). 

In the unbounded case, the depth remains constant since the circuits A 21 
have constant depth. In the bounded case, we claim that every replacement of a 
SELECT gate increases the depth by a constant. This follows from the fact that 
depth(^ 2 i'"ii ) depth(C'i_i) + 1 for i = 0,1 since a bounded counting circuit 
with m counting variables has depth at least |~log(m + 1)] • Therefore the whole 
circuit remains of logarithmic depth. □ 



Lemma 4. For every eounting eireuit C, #PT^(cr)(x) = ^COc{x). 

Proof. We will prove this by structural recursion on the output gate G of the 
counting circuit. In the proof, we will use the notation of definition 3. If G is 
a literal, then by definition, C and 'f{C) define the same counting function. If 
G is an AND gate then by definition ffPTip(c){x) = Since for 

i = 0, . . . , A; the subcircuits Gi do not share common counting variables, using 
the inductive hypothesis this is equal to #COc{x). If G is a PAD AND gate 
then from the definition and the inductive hypothesis, we have ffPTip(c){x) = 
ffCOco{x). Since for all i, Vi ^ CV(Go), we have ffCOco{x) = ffCOc{x). If G is 
a SELECT gate then #PT^(a)(x) = • #PT^(a.)(a^)- Also, #COc(x) = 

• ffCOci{x) since the value of variables in Vi do not influence the value 
of the circuit G^. The result follows from the inductive hypothesis. □ 



Theorem 5. #AC^q ^ #AC° and #NC^q ^ #NC^ 

4 Branching Programs and Counting Circuits 

Branching problems constitute another model for defining low level counting 
classes, and this possibility was first explored by Caussinus et al. [CMTV98]. Let 
us recall that a counting branching program is an ordinary branching program 
[Bar89] with two types of (standard and counting) variables with the restriction 
that every counting variable appears at most once on each computation path. 
Given a branching program B{x, u) whose standard variables are x, the counting 
function computed by it is 

p, N _ J B{x) if B does not have counting variables, 

if (x) I B{x,u) = 1} otherwise. 

Finally #BR is the family of counting functions computed by DLOGTIME- 
uniform polynomial size and constant width counting branching programs. Causs- 
inus et al. [CMTV98] could prove that #BR C ^NC^, but they left open the 
question if the inclusion was proper. Our results about counting circuits enable 
us to show that #AC° C ^BR. Our proof will proceed in two steps. First we 
show that ffAC^Q is included in #BR and then we use theorem 5 to conclude. 
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Theorem 6. ^AC^o ^ #BR. 

Proof. We define a function rj which maps counting circuits into synchronous 
counting branching programs such that for every counting circuit C, ffCOc = 
The definition will be done by structural recursion on the output gate 
G of the counting circuit, and the proof of this statement can be done without 
difficulty also by recursion. 

If (7 is a boolean tree circuit, then rj{C) is a width-5 polynomial size branching 
program provided by Barrington’s result which recognizes the language accepted 
by C. If Co and Ci are counting circuits and u is a new counting variable then 
? 7 (SELECT(C'o, C*!, m)) is the branching program whose source is labelled by 
u, and whose two successors are the source nodes of respectively rj{Co) and 
rj{Ci). If Co and Ci are counting circuits without common variables, then by 
increasing the width of rj{Co) by 1, we ensure that it has a unique 1-sink. Then 
? 7 (AND(Co, C l)) is the branching program whose source is the source of rj{Co) 
whose 1-sink is identified with the source of r]{Ci). If C is a counting circuit 
and uq, ...,«* are counting variables, then ? 7 (PADAND(C, uq, . . . , Uk)) = rj{G') 
where C' is obtained from C by setting uq, ■ ■ ■ ,Uk to 1. 

If {Cn)ffLi is an AC° family of counting circuits, then {rj(Cn)) has con- 
stant width since we double the width only a constant number of times for 
the SELECT gates. Also, the uniformity of the family follows from the unifor- 
mity of (C„). □ 

5 Gap and Random Classes via Semantical Counting 

In this section, we will point out another similarity between semantical counting 
circuits and deterministic Turing machine based counting: we will define proba- 
bilistic classes by counting the fraction of assignments for the counting variables 
which make the circuit accept. We will prove that in the unbounded error case, 
our definitions coincide with the syntactical definition via Gap classes. For the 
proof, we will need the notion of an extended counting circuit which may contain 
OR and PADOR gates without changing the family of counting functions they 
can compute. In the bounded error and one sided error models we could not 
determine if our definitions and the syntactical ones are identical. 



5.1 Extented Counting Circuits 

By definition, the unbounded fan-in PADOR(xq, mq, •••,«;) gate has at least two 
arguments and is defined as OR(xo,mo, • • • ,Ip)- Rs bounded fan-in equivalent is 
represented by a circuit of depth |~log(f +2)], consisting in the usual bounded 
expansion of the above circuit. Similarly to the case of the PAD AND gate, from 
now on we will suppose without loss of generality that if PADOR(D, uq, ■ ■ ■ ,ut) 
is a subcircuit of an extended counting circuit, then CV(D) D {uq, . . . , m;} = 0. 

An unbounded fan-in extended counting eircuit is a counting circuit with the 
following additional construction steps to the Definition 1. 
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— if (Co , . . . , C*/; are extended counting circuits and they do not have any com- 
mon counting variable then OR(Cq, ... ,Ck) is a,n extended counting circuit. 

— if (7 is an extended counting circuit and uo,...,U[ are input variables, 
then PADOR(C', mq, . . . ,M() is an extended counting circuit. The variables 
uq, . . . ,ui are counting variables. 

— if (7 is a counting circuit, C is an extended counting circuit. 

We obtain the definition for the bounded fan-in case by taking again k = 1. We 
will denote by #ACeqo (respectively #NCeqo) the set of functions computed 
by a uniform AC° (respectively NC^) family of extended counting circuits. The 
following theorem whose proof will be given in the appendix shows that extended 
counting circuits are no more powerful than regular ones. 

Theorem 7. ^AC^qq = ^AC^o ^NC^qq = ^AC^o- 

Proof. Let C be an extended counting circuit. We will show that there exists a 
counting circuit computing ffCOc whose size and depth is of the same order of 
magnitude. First observe that negation gates in C can be pushed down to the 
literals. For this, besides the standard de Morgan laws, one can use the following 
equalities whose verification is straightforward: 

SELECT((7o, . . . , C'2fc_i, Mo, • • • ,Uk-i) = SELECT((7o, . . . , C' 2 fc_i, mq, . . . , Uk-i), 

PADAND((7o, (mo, . . . ,m;)) = PADOR((7o, (mq, . . . ,m;)), 

PADOR((7o, (mo, . . . ,M()) = PADAND((7o, (mq, . . . ,m;)). 

Then one can get rid of the OR gates by recursively replacing OR((7o, . . . , C' 2 fc_i) 
with SELECT((7o, . . . , C' 2 fc_i, mq, . . . , Uk-i) where {mq, . . . , MA,-i}nCV((7o)U. . .U 
CV((72fc_i) = 0. Since Cq, ... , C 2 k_i do not have any common counting variables, 
this does not change the counting function computed by the counting circuit. 
Finally we show how to extend the f function of Definition 2 to PADOR gates 
while keeping the same counting function as in Lemma 2. We define 



</>(PADOR(Co , (mq , . . . , M^ )) ) OR( A2icv(,,>(Co)) i .(2^+^ — i) , ^Co))- 

Then, #PT,^(a)(x) = #PT,^(c„)(x) + ( 2 l^v(,^(Co))| . ( 2 (+i _ i)) ^hich by the 
inductive hypothesis is #COao -(2*+^ — 1)). This in turn equals to 

ffCOc{x) by definition of the PADOR gate and since uq, ... ,ui are not counting 
variables of Cq. 

Since all these transformations may increase the size or the depth only by a 
constant factor, the statement follows. □ 

5.2 Gap Classes 

As usual, for any counting class ffC, we define the associated “Gap” class GapC. 
A function / is in GapC iff there are two functions /i and /2 in ffC such that 
/ = /i — / 2 . The following theorem will be useful for discussing probabilistic 
complexity classes. 
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Theorem 8. Let f be a funetion in GapAC® (respeetively GapNC^J . Then 
there is an AG° (resp. NG^J uniform family of eounting circuits (G„) such that 
2-/(x) = #COa|.|(x)-#CO-|^i(x). 

Proof. In fact, we will construct a family of extended counting circuits with the 
required property. Then, the result will follow from theorem 7. Let #G be one of 
the classes or and let / be a function in GapG. Then there exist 

two functions in #G, /i and f'2 such that f = fi — f'2- Fix an entry x of length 
n. Let us take two uniform families of counting circuits which compute /i 
and f'2, and let respectively D\ and D2 be the counting circuits in these families 
which have n input variables. 

Let Vi = CV(19i), V 2 = CV(L> 2 ) and m =| Vi | + | V 2 |- We will suppose 
without loss of generality that Vi n V 2 = 0 (by renaming the variables if neces- 
sary). We define = PADAND(L>i, V 2 ) and = PAD AND ( 192, Vi). L^ be 
a counting variable such that t ^ Vi U V 2 and define G„ = SELECT(D(, t). 
The counting circuit family (G„) is uniform and is in G since its depth and size 
are changed only up to a constant with respect to the families computing /i and 
/ 2 - 

We first claim that the counting function associated with on entry 
X, computes fi{x) + (2™ — f2{x)). First, let us observe that #GOyy-(x) = 
2™ — ffCOo'{x). Therefore, since t does not appear in and D'2, we have 
#GOc„(a^) = #COij' (x) + (2™ — #GOij' (x)). By the definition ofthePADAND 
gate, the claim follows. 

The number of variables in G„ is m+1. Therefore, #GOc„(x)— #CO— (x) = 
2 • ^GOa„(x) — 2™+^. By the previous claim, this is 2/(x). □ 



5.3 Random Classes via Semantical Counting 

Another advantage of semantical counting circuits is that probabilistic complex- 
ity classes can easily be defined in this model. The definition is analogous to the 
definition of probabilistic classes based on Turing machines’ computation: for a 
given input, we will count the fraction of all settings for the counting variables 
which make the circuit accept. We will define now the usual types of probabilis- 
tic counting classes in our model and compare them to the existing definitions. 
For a counting circuit C we define Prco(G(x)) by: 

Pr.ofGfxll-/^(^) ifCV(G) = 0, 

cowixj) j I ^ ;L}/2|CV(C)| jf ^ 0 

Let now C be one of the classes AG° or NG^. Then PCco is the family of 
languages for which there exists a uniform G family of counting circuits ( G„) such 
that X e L iff Prco(G|j;|(x)) > 1/2. Similarly BPGco is the family of languages 
for which there exists a uniform G family of counting circuits (G„) such that 
X e L iff Prco(Gp|(x)) > 1/2 -b e for some constant e > 0. Finally, RCco is 
the family of languages for which there exists a uniform G family of counting 
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circuits (C„) such that if x G L then Prco(C'|a;| ( 2 ^)) > 1/2 and if x ^ L then 
Prco(C'|a;|(a^)) = 0. 

Let us recall that PC was defined [AAD97, CMTV98] as the family of lan- 
guages L for which there exists a function / in GapC such that x G L iff /(x) > 0. 
Theorem 8 implies that these definitions coincide with ours. 

Bounded error and one sided error circuit based probabilistic complexity 
classes were defined in the literature for the classes in the AC and NC hierarchy 
[Weg87, Joh90, Coo85], These are semantical definitions in our terminology, but 
unlike in our case, no special restriction is put on the way counting variables are 
introduced. To be more precise, let a probabilistic circuit family (Cn) as a uni- 
form family of circuits where the circuits have standard and probabilistic input 
variables and the number of probabilistic input variables is polynomially related 
to the number of input variables. For any input x, the probability that such 
a family accepts x is the fraction of assignments for the probabilistic variables 
which make the circuit accept x. Then the usual definition of BPC and RC 
is similar to that of BPCco and RCco except that probabilistic circuit families 
and not counting circuit families are used in the definition. 

The robustness of our definitions is underlined by the fact that the bounded 
error (respectively one-sided error) probabilistic class defined via constant depth 
and polynomial size branching programs lies between the classes BPAC^o and 
BPNC^o (respectively RAC^o and RNC/;q). This follows from the inclusions 
#ACco ^ #BR C ^NCco) and from the fact that counting branching programs 
are also defined semantically. 

As we mentioned already, it is known [SST95, VW96] that if PAC° is de- 
fined via probabilistic and not counting circuit families, then it is equal to PP. 
Therefore, it is natural to ask what happens in the other two error models: is 
BPCco = BPC and is RCco = RC? If not, then we think that since branching 
programs form a natural model for defining low level probabilistic complexity 
classes, the above result indicates that counting circuits might constitute the 
basis of the “right” definition. 



6 Conclusion 

Circuit based counting functions and probabilistic classes can be defined seman- 
tically via counting circuits families. These circuits contain additional counting 
variables whose appearances are restricted. When these definitions are equivalent 
to the syntactical ones, we can rightly consider the classes robust. In the opposite 
case, we think that this is a serious reason for reconsidering the definitions. 

Let us say a word about the restrictions we put on the counting variables. 
They are essentially twofold: firstly they can not appear in several subcircuits 
of an AND gate and secondly they can be introduced only via the SELECT 
and PADAND gates. Of these restrictions, the second one is purely technical. 
Indeed, any appearance of a counting variable u can be replaced by a subcircuit 
SELECT(0, 1, m) without changing the counting function computed by the cir- 
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cuit. On the other hand, the first one is essential: without it, would be 

equal to #P. 
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Abstract. Map labeling is a classical key problem. The interest in this 
problem has grown over the last years, because of the need to churn 
out different types of maps from a growing and altering set of data. We 
will show that the problem of placing street names without conflicts in 
a rectangular grid of streets is NP-complete and APX-hard. This is the 
first result of this type in this area. Further importance of this result 
arises from the fact that the considered problem is a simple one. Each 
row and column of the rectangular grid of streets contains just one street 
and the corresponding name may be placed anywhere in that line. 



1 Introduction and Definitions 

The street layout of some modern cities planned on a drawing table are often 
right angled. In a drawing of such a map the street names should be placed 
without conflict. Thus each name should be drawn within the rectangular area 
without splitting or conflicting with other names. See Figure 1 for an example. 
The names are indicated by simple lines. 

A map consists of an Nh x Ny grid with Nh columns and Ny rows. A horizontal 
line, i.e. a horizontal street name, is given by the pair {i, 1) where i {1 < i < Ny) 
indicates the row of the street and ^ (1 < ^ < A/j) the length of that name. The 
vertical lines are given in a similar way, by indicating their column and height. 

A placement in a map assigns to every line (street name) a position to place 
the first character in. If the first character of a horizontal line (f, 1) is placed in 
column s (1 < s < Nh — f + 1), then the name will occupy the following space 
in the grid: {{j,i) \ s < j < s + I — 1}. Vertical lines are placed analogously. A 
conflict in a placement is a position occupied by both, a vertical and a horizontal 
line. 

Given a map and a set of lines to be placed in the map, StrP denotes the 
problem to decide whether the given lines may be placed conflict-free. We will 
show that this problem is NP-complete. Max-StrP is the problem to find a place- 
ment that maximizes the number of lines placed without conflict, which will 

* Supported by DFG Project HR 14/5-1 “Zur Klassifizierung der Klasse praktisch 
losbarer algorithmischer Aufgaben” 



G. Bongiovanni, G. Gambosi, R. Petreschi (Eds.): CIAC2000, LNCS 1767, pp. 102-112, 2000. 
(c) Springer- Verlag Berlin Heidelberg 2000 
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12345678 N^. 



Fig. 1. A map with street names 



be shown to be APX-hard below. An overview on approximation problems and 
hardness of approximation proofs can be found in [Ho96, MPS98]. 

Map labeling problems have been investigated intensively in the last twenty 
years. Most results are in the field of heuristic and practical implementation 
[WS99]. The known complexity results are about restricted versions of the map 
labeling problem. In [KI88, FW91, MS91, KR92, IL97], NP-completeness results 
are presented for the case that each label has to be attached to a fixed point. 
The type of label and alignment varies in these papers. For other models of the 
map labeling problem see [WS99]. 

Note that this type of map labeling was motivated by practical applications 
and introduced in [NW99]. There are algorithms given for solving StrP which run 
for some special cases in polynomial time and perform reasonable in practical 
applications. With this respect, the APX-hardness is even more surprising. For 
the harder problem to place lines on a cylindrical map, there is also a proof of 
NP-completeness in [NW99] which relies essentially on the cylindrical structure. 

Note that our notation differs from [NW99] where for instance a vector of 
street- lengths for all rows and columns is given as input. But for showing APX- 
hardness by a reduction from 3SAT, we need to describe efficiently a map of 
order X n having only 0{n) lines to be placed, if n is the size of the formula. 

Our reduction is based on the following result. Please note that in the for- 
mulae constructed in the proof of that theorem, every variable is used at least 
twice 



^ This is not surprising at all: variables used at most once in a formula can be assigned 
a truth value trivially, hence their inclusion would rather make the problem more 
tractable, not harder. 
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Theorem 1 (J. Hastad [Ha97]). 

For every small s > 0, the problem to distinguish 3SAT instanees that can he 
satisfied from those where at most a fraction of 7/8 + e of the elauses can be 
satisfied at the same time, (1, | + e) — 3SAT for short, is NP-hard. 

The construction and proof of NP-completeness of StrP will be presented in 
Section 2. Section 3 contains the proof for the APX -completeness of Max-StrP 
and Section 4 gives the conclusions. 

2 Map Construction and NP-Hardness 

In this section, we give a reduction from 3SAT to StrP which we use for showing 
NP-hardness as well as APX-hardness. The proof of NP-hardness comes imme- 
diately with the construction. For the proof of APX-hardness of StrP only a 
small construction detail has to be added. Thus we give in this section the full 
construction, and we prove the APX-hardness of StrP in the next section. 

Assume we have a 3S AT formula f consisting of clauses ci, ... ,ci over vari- 
ables xi,. . . , Xm- Each clause Ci is of the form Zip V Zi^2 V 2 :^, 3 , the Zi^j being 
from {xi, . . . , Xm} U {TT, . . . , TTf}. We assume w.l.o.g. that each variable occurs 
at least twice. 





Variables 


Mirror neg. occurr. 




Clauses 






stpfifi 




Stp'"°-\ 

Fm,t 






,J 


stpli 


stpf^ 


stpf^ 







Fig. 2. The outline of the map 



First we give an informal description of the proof idea. We want to construct 
a map M,f, out of f. Let n = 31 + m. M,f, will have height = 14n and width 
Nfi < 36n^. It will be split into several vertical stripes, which means that there 
are several vertical lines of full height Ny. For clear reference, we use names for 
the stripes. The general picture of M,j, is shown in Figure 2. 

It consists of three groups of vertical stripes. To the left, we have a pair of 
vertical stripes for each variable. Here, the placement of lines corresponds to 
assigning a certain value to the respective variable. The rightmost part contains 
for each clause a triple of vertical stripes such that a non-conflicting placement 
of all lines in these stripes can be made only if there is a clause satisfying setting 
of at least one of its variables represented in the leftmost part. The middle 
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part, called “mirror negative occurrences” is necessary to connect the other two 
properly. It will be explained below. 

To ease the description we will just define the width of these stripes and 
further lines added within the strips. Thus the position of the separating vertical 
lines will be implicitly defined. The width of these stripes varies between 6n and 
An — I > 3n. The position of vertical lines put within such a stripe will be given 
relative to the separating vertical lines. Let stp be a stripe of width w surrounded 
by vertical lines at position c and c + w + 1. If we say a vertical line is put at 
position i (1 < i < w) in stripe stp, this line will be put in column c + f of the 
whole map. 

To move the information around in our main tool are horizontal lines 
which fit into the above mentioned stripes. These horizontal lines will be ordered 
(from top and bottom rows towards the middle) according to their length. A line 
of width 6n — j will be in row n + j or Ny — n—j, where 0 < j < 3n. The reason 
for leaving the top and bottom n rows unused will become clear in Section 3. 
We denote by H{j) a horizontal line of width 6n — j put in row n + j and by 
H{j) a horizontal line of width 6n — j put in row Ny — n—j. 




Fig. 3. A vertical stripe 



In the middle of a typical vertical stripe of width 6n — j, we will put a 
vertical line of height Ny — 1 — {n + j). Consequently, either H{j) or H{j) can 
be placed in this stripe, by placing the middle vertical line either at the lowest 
position, opening row n + j for H{j), or at the highest position, opening row 
Ny — n — j + 1 for H{j). Furthermore, no other horizontal line H{j') or H{j') 
(0 < j' < 'An,j' j) can be placed in that stripe. Either it is wider than the 
whole stripe, or if it is smaller, it will be in a row which is already blocked by 
the middle vertical line, see Figure 3. 

There will be some exception where we may place several horizontal lines in 
the same stripe. But as we will see, the overall principle will remain unaffected. 

The details of the construction are shown in Figure 4. Here we have solid 
lines depicting lines drawn in their definite position or in one of their only two 
possibly non-conflicting positions. These positions are always limited to the re- 
spective vertical stripe. Dashed lines are used for the horizontal lines which are 
the essential tool to connect the different parts of the map. They have always 
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two possibly non-conflicting positions in two different parts of the map. For some 
of them, you can see both positions in Figure 4. Finally, there is the dotted line 
shown in all of its three possibly non-conflicting positions in the rightmost part. 



6n I 6n-5 I I &n-k I I 6 n-W 2 I 

I 3n I I 5n-k I 1 3n I 




I I I I I I ^iPi,2 I ^^Pi,3^ 

(a) (6) (c) 



Fig. 4. Components of the map 



The Variable Part 

More precisely, we start constructing by building a pair of vertical stripes 
as in Figure 4 (a) for each variable Xi (1 < i < m). Assume Xi 

occurs Si times in <p. Furthermore, let ti = i + and to = 0. Stripe 

stp^^ is set to width Wi^a = 6n — and the width of stripe is Wi^b = 

6n — U + 1 = 6n — U_i — Si. You will see that i = 1 and Si = 5 gives a picture 
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as in Figure 4 (a). Note that in Figure 4, numbers above and below the map 
give the amount of free space between vertical lines whereas numbers to the left 
mark the row a horizontal line is put in. 

Now we put vertical lines and in the middle of the stripes 
and to prevent unwanted placement of horizontal lines. Since all we need 

is that at most 3n columns are left free to either side (all horizontal lines will 
be wider), it has only to be roughly the middle. Thus we put vertical lines of 
height Ny — n — ti in both stripes at the respective position 3n + 1. 

These are complemented by horizontal lines = H{ti — 1) and = 
H{ti — 1). Together, these lines construct a switch as depicted in Figure 5. 



(a) (b) 

Fig. 5. Variable switch 



Our general principle of stripe construction explained above assures that the 
horizontal line must not be placed elsewhere without conflict. Thus, 

Figure 5 shows the essentially only two possible non-conflicting placements of the 
lines built so far. Here, it is unimportant which placement exactly a horizontal 
line — 1) or H{ti — 1) gets in the left stripe stp^^ . 

Next, we add a horizontal line for each occurrence of the variable Xi. If Zjji 
contains the /cth occurrence of Xi in (f), then represents that occurrence. 
If Zjji = Xi then + k), on the other hand, if Zj^j' = T7, then 

= H{ti-i + k). These are the dashed lines of Figure 4 (a). 

If the switch is placed as in Figure 5 (a) all lines corresponding to posi- 
tive occurrences of Xi can be placed without conflict in stripe stp^°^ . The lines 
corresponding to negative occurrences cannot be placed without conflict in the 
switch since stripe is blocked and the stripe stp^l^ is narrower than all 

of these lines. This placement will correspond to setting Xi = 1. Similarly, the 
other possible placement of the switch corresponds to setting Xi = 0. 

We will end up using just n = 31 + m different widths of horizontal lines, 
from 6n to 5n + 1. Thus, the next step can be started by using width 5n. 
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The Middle Part 

The aim of the middle part is to mirror the horizontal lines representing the 
negative occurrences of variables into the upper half of the map. This is done by 
using a stripe stp™’’ as depicted in Figure 4 (b). Assume the occurrence under 
consideration is Zi^j = x^. 

The new stripe has the width of the horizontal line to be mirrored. Let 

that line be = H{k), 0 < k <n. The width of stripe stp^j^ is 6n — k. Now 
we add inside the stripe a vertical line u™’’ at position (5n — /c) + 1 of height 
Ny — 1 — (n + j)). This reduces the width of the stripe for all inner rows from 
6n — k to 5n — k. Inside this reduced stripe, we put new lines = H{n + k) 
and a vertical line u™’’ at position 3n + 1 of height Ny — 1 — (2n + j)). 

Let us look at the idea of this construction. If Xy is set to 0 (i.e. the lines 
placed accordingly), can be placed in stpy‘^^. Then 

u™’’ can be placed in lowest position. This again allows to place above it. 
If Xy is set to 1, must be placed in stripe stpY^-^. Then both u™’’ and 

u™’’ are “pushed upwards” which makes it impossible to place in stp™” 
without conflict. 

The whole middle part of the map is made up out of stripes like that. Note 
that the new horizontal lines are all of different width since so where the original 
ones. We just reduce the width by n. 



Stripes for Clauses 

Finally, we go into constructing a triple of stripes stp^ stp’l 2 , stp^ ^ for each 
clause Ci- Let Ci = Zij V Zi^2 V 2 :^, 3 , and let 4n — j > 3n be a new width, not used 
before. Consider the horizontal lines /igi, /ig 2 , /igs representing those occurrences 
in the upper half of rows. That means hij = if Zij is a positive occurrence 
of a variable, and hij = if it is a negative occurrence (for j e {1, 2, 3}). For 
speaking about the width of the new stripes, let hij = H{wj) for j e {1, 2, 3}. 

For each of the three occurrences, we construct a stripe stph,i S {1,2,3} 
analogously to those described for the middle part, except that we use as new 
line width for all three stripes the same value An — k. Consequently, only one 
new horizontal line = H{2n + k) is created. The width of stripe stp^j is Wj 
for j e {1,2,3}. 

Each of the three strips will have new vertical lines are vf j of height Ny — 
1 — n — Wj at position 4n — A: + 1 (reducing the width to 4n — k) and vf j of height 

— 1 — 3n — A: at position 3n (the “middle” line), see Figure 4 (c). 

Now the overall effect of this is as follows. Assume one of the literals Zij in 
a clause Ci is Zij = Xy, and Xy is set to 1, represented by placing the lines of 
the variable switch for Xy as described above. Then the corresponding horizontal 
line hij = can be placed in stp^y. This leaves free the place of that line in 
the corresponding stripe stp^ j of the clause. The vertical lines j of that 

stripe can be placed in highest position, and finally, the unique horizontal line 
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can be placed in that stripe. Similarly, assume Zij = and Xu is set to 1, 
represented by placing the lines of the variable switch for Xu as described above. 
Then can be placed in and hi^j = can be placed in stp™^ after 

placing and in lowest position. Again, this leaves free the place of line 
hij in the corresponding stripe stpf j of the clause, and finally, /if can be placed 
in that stripe. 

If on the other hand none of the three variables is set to fulfill the clause, 
that causes all three horizontal lines /ii,i, /ii, 2 , to be placed in their re- 
spective stripes s/pf i, But this “pushes down” the vertical lines 

uf 1 ) uf 1 , uf 2 > uf 3 , uf 3 which results in the impossibility to place the clause 
representing line /if conflict free. 

Overall, each non-satisfled clause corresponds to an unavoidable conflict. So 
far, we have shown our first result. 

Theorem 2. StrP is NP-hard □ 



3 A Bound of Approximability 

In this section we will show that the previous construction can be used to obtain 
thresholds on approximating StrP. 

Theorem 3. For every small e > 0, the problem to distinguish StrP instances 
that can be placed conflict-free from those where at most a fraction of ||| + e of 
the lines can he placed, without conflict^ (1, ||| + e) — StrP for short, is NP-hard. 



Corollary 1. For every small e > 0, there is no polynomial time (||| — e)- 
approximation algorithm for Max-StrP unless P=NP, i.e. Max-StrP is APX-hard. 

Proof (of Theorem 3). In view of Theorem 1, it is sufficient to show that the 
above construction satisfies the following claims. 

1. From a formula 4> containing I clauses, the above construction yields, in 
polynomial time, a map containing at most 28/ lines (if every variable 
is used at least twice). 

2. There exists a polynomial time procedure Pa that works on a map as 
follows. Given a placement p where m horizontal lines have conflicts, Pa 
outputs a placement p' where at most m horizontal lines have conflicts, such 
that each has only one conflict. Moreover, all horizontal lines with conflicts 
are of the type /if constructed in the last part of the above construction. 

3. There exists a polynomial time procedure Pb that works on a map as 
follows. Given a placement p' as generated by Pa with m conflicts, Pb gen- 
erates an assignment to the variables of 4> such that at most m clauses of 4> 
are not satisfied. 
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Then, assuming we would have an algorithm A deciding (1, ||| + e) — StrP 
in polynomial time, we could get one deciding (1 , | + 28e) — 3SAT in polynomial 
time as follows. (Note that we do not need the fact that Pa and Pf, are efficient.) 

Given (1, | + 28e) — 3SAT instance (j) with I clauses, construct and apply 
the assumed decision algorithm. 

If 4> is satisfiable, there is a conflict-free placement in as we have already 
seen in Section 2. 

If on the other hand at most a fraction | + 28e of the clauses of (p where 
satisfiable, we look at M^. Assume p to be a placement where a maximal number 
of lines in are placed without conflict. Let m be the number of horizontal 
lines having a conflict under placement p. That is, the fraction of lines which 
could be placed without conflict in is at most • 

Now we apply procedures Pa and Pf,, getting an assignment satisfying at 
least I — m out of I clauses in (p. By assumption about (p, we have 



I — Tfl T 

— ; — < - + 28e, that is, m > 




I, 



and hence 

281 -m ^ 281 - (I - 28e) I 223 , _ 

281 - m ” ^ ^ 

Thus the result of algorithm A would decide (1, | + 28£) — 3SAT, in contradiction 
to Theorem 1. 

It remains to prove the above claims. 

1. As mentioned in Section 2, there are at most variables if we assume 
every variable to be used at least twice. Furthermore, we have exactly 31 
occurrences of variables and I clauses. 

Constructing the leftmost part of M^, we have invented 6 lines per variable 
and 1 per occurrence which gives at most 121 lines. In the middle part, we 
need 4 new lines per negative occurrence, that is at most 61 lines (remember 
that at most half of the occurrences are negative). Finally, in the rightmost 
part of M^, we use 9 new vertical and one new horizontal line per clause, 
being lOf new lines. 

Overall, there are at most 281 lines to be placed in M^. 

2. We use the names of the lines and stripes as given in Section 2. 

The basic idea of Pa is that each horizontal line is best placed in one of the 
vertical stripes “made for it” . More precisely, Pa works by modifying p as 
follows. 

(a) For each variable Xu, place the lines 

two possible ways they may be placed without conflict among each other 
within the two stripes stpppp^pp , stpppppp for Xu- Which one of the two pos- 
sibilities is taken, is decided in such a way that the minimal number of 
conflicts between and those lines (placed as in p) occurs where 
Zi^j is an occurrence of Xu, and is placed completely within stripe 
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(b) Place every line which still has a conflict in stripe stp^ j if Zij is a 

positive occurrence, and in if Zi^j is a negative occurrence. 

(c) For each negative occurrence Zi^j, place u™’’ and u™’’ in their highest 
position if is placed in stpY^J^ , and in lowest position otherwise. 

(d) If u™’’, u™’’ are placed in highest position, place in stply Otherwise, 
place in stpY^-'^. 

(e) If hi^j is placed in stripe stp^y place ^ and in their lowest position, 
otherwise in their highest position. 

(f) It hi still has a conflict, place it in any of the stripes stp'l^l'^ , stpY^ 2 ^, or 

stpYii^- 

The crucial point is that none of these steps increases the number of hori- 
zontal lines having conflicts. We check this step by step. 

(a) First, /i™/' are placed without conflict here. Secondly, due to the 
general principle of the construction, all other horizontal lines than those 

representing occurrences of Xu cannot be placed without conflict 
within the stripes for Xu- And those representing occurrences of Xu may 
be placed without conflict only in the left stripe. All that remains to show 
is that the minimal number of conflict between and those lines is 
assumed in one of the extremal positions of . But now, the fact that 
the first and last n rows of are not used for horizontal lines takes 
effect. The height of is always at least Ny — 2n. Thus it is impossible 
to have horizontal lines placed both below and above at the same 
time. Consequently, from any placement of , changing it into at least 
one of its extremal positions is save. 

(b) Only horizontal lines already having conflicts are affected. 

(c) Only and may be placed in stp™'" without conflict. If is 
present, this step may move a conflict from to Otherwise, only 
a possible conflict of is resolved. 

(d) Again, only horizontal lines having conflicts are affected. 

(e) Symmetrically to step (d), a conflict may only be moved from hij to hf. 

(f) Again, only horizontal lines already having conflicts are affected. 

This guarantees that the number of horizontal lines already having conflicts 
is not increased. 

Moreover, steps (b) and (c) for negative occurrences, resp. (b) and (e) for 
positive occurrences, assure that lines are placed without conflict. Re- 
member that hi^j = for positive occurrences, and hgj = for nega- 
tive. Steps (d) and (e) guarantee the same for 

Finally, the lines hi are placed in a stripe where they may have a conflict 
with at most some v1 i.e. have at most one conflict. 

3. In Section 2, we have convinced ourselves that the “variable switches” in 
the leftmost part of can be placed without conflict only in two ways, 
interpretable as setting the corresponding variable to 0 or 1. Since all con- 
flicts are on the rightmost part of M^, we can take that interpretation as 
an assignment to the variables. Also in Section 2, we have seen that this 
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assignment satisfies a clause q if the line /if can be placed without conflict. 
Thus, if there were initially m conflicts present in the placement, at most m 
clauses will be non-satisfied. □ 



4 Conclusion 

We have presented the first proofs of NP-completeness and APX-hardness for a 
simple map labeling problem. An algorithm with an approximation factor of two 
is easy. Just label either all horizontal or all vertical streets. It remains open to 
close the gap between both factors. Our results extend easily to the case where 
some regions of the map may not be used for labels. It is also interesting to look 
for more extensions. 
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Abstract American cities, espeeially their central regions usually have a very 
regular street pattern: We are given a rectangular grid of streets, eaeh street has to 
be labeled with a name running along its street, sueh that no two labels overlap. 
For this restricted but yet realistic case an efficient algorithmic solution for the 
generally hard labeling problem gets in reach. 

The main eontribution of this paper is an algorithm that guarantees to solve every 
solvable instance. In our experiments the running time is polynomial without a 
single exeeption. On the other hand the problem was reeently shown to be AfV- 
hard. 

Finally, we present efficient algorithms for three special cases including the case 
of having no labels that are more than half the length of their street. 



1 Introduction 
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Fig-1- Ameriean downtown street map. 
is a conflict free tiling of the board placing all the labels along their streets. 



The general city map labeling prob- 
lem is too hard to be automated 
yet [NHOO]. In this paper we focus 
on the downtown labeling problem, a 
special case, that was recently shown 
to be A/’P-hard [USOO]. 

The clearest way to model it, is 
to abstraet a grid-shaped downtown 
street pattern into a chess board of ad- 
equate size. The names to be placed 
along their streets we abstract to be 
tiles that w.l.o.g. span an integer num- 
ber of flelds. A feasible labeling then 
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Our main algorithm is kind of an adaptive backtracking algorithm that is guaranteed 
to find a solution if there is one. Surprisingly it has an empirically strictly bounded 
depth of backtracking, namely one, which makes it an empirically efficient algorithm. 
Given this experience this makes the theoretical analysis of our algorithm even more 
interesting. 

Using results from a well studied family of related problems from Discrete Tomog- 
raphy [Woe96, GG95], we provide a NP-hardness result for a slightly different labeling 
problem, taking place on a cylinder instead of a rectangle. 

We round up the paper by giving efficient solutions to special cases: There is a 
polynomial algorithm, if 

- no label is longer than half of its street length 

- all vertical labels are of equal length 

- the map is quadratic and each label has one of two label lengths 

One general remark that helps to suppress a lot of formal overhead: Often, we only 
discuss the case of horizontal labels or row labels and avoid the symmetric cases of 
vertical labels and columns or vice versa. 

2 Problem Definition 

Let G be a grid consisting of n rows and m columns. Let R = {i?i, . . . , Rn} and 
C = {C\, . . . , Cm} be two sets of labels. The problem is to label the row of G with 
Ri and the column of G with Cj such that no two labels overlap. We will represent 
the grid G by a matrix. 

Definition 1 (Label problem (G, R, C, n, m)). 

Instance: Let Gn,m be a two dimensional array of size nxm, Gij G {0, r, c}. Let Ri 
be the label of the row and let ri be the length of label Ri, 1 <i <n. Let Ci be 
the label of the column and let Ci be the length of label Ci. 

Problem: For each row i set ri consecutive fields of Gi^. to r and for each column j 
set Cj consecutive fields of G.j to c. 

Of cause no label can be longer than the length of the row or column, respectively. 
Initially, we set Gij = 0 which denotes that the field is not yet set. Let [a, b[ be 
an interval such that Gi^x G {0, r}, for x G [a, b[. We say that Gi^[a,b[ is free for row 
labeling. Furthermore, this interval has length b — a. We also say that G^,. contains 
two disjoint intervals of length at least that are free for row labeling, namely 

[a, a + [ and [a + , b[. 

3 General Rules 

Assume we have a label with length longer than half of its street length. No matter how 
we position the label on its street, there are some central fields in the street that are 
always occupied by this label. We therefore can simply mark these fields occupied. It 
is easy to see that these occupied fields can produce more occupied fields. The follow- 
ing rules check whether there is sufficiently large space for each label and determines 
occupied fields. 
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Rule 3.1 (Conflict) Let I = \a,b[be the longest interval of row i that is free for row 
labeling. If Vi > b — a, then row i can not be labeled, since it does not contain enough 
free space for row labeling. In this case we say that a conflict occurred and it follows 
that the instance is not solvable. 

Rule 3.2 (Large labels) Let I = \a,b[be the only interval in Gi^. that is free for fea- 
sible row labeling. Observe that the fields that are occupied simultaneously when Ri is 
positioned leftmost and rightmost in I have to be occupied by Ri no matter where it is 
placed. These fields we set to r and call them preoccupied. 



Procedure 1 Preprocessing (G, R, C, n, m) 

1) repeat { 

2) G' = G; 

3) run Rule 3.2 on (G, R, C, n, m) and on (G^, G, R, m, n); 

4) if Rule 3.1 yields a conflict on {G, R,C,n,m) ov on ,C, R,m,n) then 

5) return “conflict”: 

6) }until (G = G'): 

7) return true; 



Our Preprocessing Procedure 1 iteratively exeeutes the Rules 3.1 and 3.2 until 
none of them yields a further change to the label problem or a conflict occurs. In the 
latter ease we have that the instance is not solvable. We will spell out special cases 
where the successful preproeessing implies solvability. Furthermore, the preprocessing 
underlies the following considerations. 

For each unflxed label that is limited to just one interval of at most twice its length or 
to two intervals of exaetly its length we can check whether these labels can be simulta- 
neously positioned without conflicts. This can be done since all possible label positions 
of these rows and columns can be encoded in a set of 2 SAT clauses, the satisfaction 
of which enforces the existence of a conflict free label positioning of these labels. On 
the other hand a conflict free label positioning of these labels implies a satisfying truth 
assignment to the set of elauses. Even, Itai and Shamir [EIS76] proposed a polynomial 
time algorithm that solves the 2SAT problem in time linear in the number of clauses 
and variables. 

We therefore represent each matrix fleld Gi^j by two boolean variables. We have 
the boolean variable Gij = r and its negation Gij = r which means Gij f r or 
Gij G {0)C}. As the second variable we have Gij = c and its negation Gij = c 
which means Gij f cor Gij G {0, r}. Of course these two variables are coupled by 
the relation {Gij = r) — > {Gij = c). 

Those rows and columns, where the possible label positions are limited to just one 
interval of at most twice its length or to two intervals of exactly it length, we call dense. 
We now eneode all possible label positions of the dense rows and eolumns in a set of 
2SAT elauses the satisfaction of which yields a valid labeling of these rows and columns 
and vice versa. 
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Property 1 (Density Property I). Let . be a row that contains exactly two maximal 
intervals each of length Vi that are disjoint and free for feasible row labeling. Let these 
intervals be [a, b[ and [c, d[, l<a<6<c<d<n+l. Then, a valid labeling exists 
if and only if the eonditions 

L [Gi^a = r) [Gi^a+i = r) {Gi^a +2 = r) v-i- • • • v-i- {Giy-i = r), 

2. {Gi^c = r) {Gi^c+i = r) {Gi^c +2 = r) ^ ^ {Gi^d-i ='>'), 

3. (Gi^a = r) ^ (Gi^c = r) 

are fulfilled. 



{Gi,a = r) {Gi^a+i = can be written as the 2SAT clauses {Gi^a = r V 
Gi,a+i = r), (Gi^a = ry Gj,g + i = r) an d since the condition {Gi^a = r) (Gi^c = 
r) can be written as {Gi^a = r V Gi^c = r), {Gi^a = r V Gi^c = r) it is easy to see that 
the eomplete Density Property 1 can be written as a set of 2SAT clauses. The feasible 
label placements are {Gi^a = r, . . . , Giy-i = r) and (Gi,c = r, . . . , Gi^d-i = r). 

Property 2 (Density Property II). Let Gi^. be a row that contains only one maximal 
interval [a, h[ that is free for feasible row labeling, Vi < b — a < 2r^. Then, a valid 
labeling for the row exists if and only if the eonditions 

1. — r ) ^ (Gi.a+i — r) ^ (G'2.a+2 — r ) ^ ^ — 1 — r), 

2 - — r,..., Gi^ad-r.1 — 1 — r , and 

3. (Gj.a — r) (Gj.a+rj — r ) , (Gj.a+1 — r ) • • • 5 7.^ — 1 — r ) 

(Gi,f, = r) 

are fulfilled. 

Analogously to the first Density Property, the eonditions of the seeond Density 
Property can be formulated as a set of 2SAT clauses. All feasible label placements 

are {Gi^a — ^5 — ^7 • • • 5 G a-\-ri — l — iGi^a-\-\ — ^5 Gi^a -\-2 — • • • 5 G a-\-n — 

(.Gi^a -\-2 — Gi^a-\-S — ^7 • • • 5 t), . . . , {G — T, — 

r, , Giy = r). Note that the properties work analogously for columns. 

Theorem 1. The 2SAT formula of all dense rows and columns can be created in 0{nm) 
time. The 2SAT instance can be solved in 0{nm) time. 

Proof: The number of variables is limited by 2nm. For each dense row we have at 
most |n elauses. Analogously, for each dense eolumn we have at most |m elauses. 
Altogether we have 0{nm) elauses. Thus, the satisfiability of the 2SAT instanee ean be 
ehecked in 0{nm) time [EIS76]. □ 

Proeedure 2 calls Procedure 1, our preprocessing. In case of success, all dense rows 
and columns are encoded as a set of 2SAT clauses with the aid of Density Property 1 
and 2. Then, their solvability is checked e.g. by invoking the 2SAT algorithm of Even, 
Itai and Shamir [EIS76]. 

Lemma 1. The PREPROCESSING Procedure I and the Draw .CONCLUSIONS Proce- 
dure 2 can be implemented to use at most 0{nm{n + m)) time. 
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Procedure 2 Draw .CONCLUSIONS (G, R, C, n, m) 

1) if PREPROCESSING("G,i?,C,«,«lj then{ 

2) F := the set of 2SAT clauses of the dense rows and column; 

3) if F is satisfiable then return true;} 

4) else return false; 



Proof: The rules only need to be applied to those rows and columns in which an entry 
was previously set to r or c. A setting of a field Gi^j can only cause new settings in row 
i or column j, which by themselves can again cause new settings. The application of 
the rules on a row and a column takes time 0{n + m). Since at most 2nm fields can 
be set we yield that the preprocessing can be implemented such that its running time 
is 0{nm{n + m)). In Theorem 1 we proved that the 2SAT clauses can be generated 
and checked for solvability in 0{nm) time. Thus, we have a worst case time bound of 
0{nm{n + m)). □ 

Thus, we can solve dense problems: 

Theorem 2. In case that each row and each column of a preprocessed labeling instance 
(G, R, C, n, m) either fulfills the Density Property 1 or 2, Procedure 2 Draw .CON- 
CLUSIONS decides if the instance is solvable. In ease of solvability we ean generate 
a valid labeling from a truth assignment. The overall running time is bounded by 
0{nm{n + m)). 

4 A General Algorithm 

In this section we describe an algorithmic approach with a backtracking component that 
solves any label problem. Empirically it uses its backtracking ability in a strictly limited 
way such that its practical runtime stays in the polynomial range. After performing the 
Preprocessing and satisfiability test for dense rows and columns (see Procedure 2 
Draw .Conclusions), we adaptively generate a tree that encodes all possible label 
settings of the label problem. Each node in the first level of the search tree corresponds 
to a possible label setting for the first row label. In the level the nodes correspond 
to the possible label settings for the row, depending on the label settings of all 
predecessor rows. Thus, we have at most m possible positions for a row label and at 
most n levels. Our algorithm searches for a valid label setting in this tree by traversing 
the tree, depth-first, generating the children of a node when necessary. 

In the algorithm, we preprocess matrix G and check the solvability of the dense 
rows and columns by invoking Procedure 2 DRAW .CONCLUSIONS. We further mark 
all these settings permanently. When we branch on a possible label setting for a row, we 
increase the global timestamp, draw all conclusions this setting has for the other labels 
by invoking Procedure 2 Draw .CONCLUSIONS and timestamp each new setting. These 
consequences can be a limitation on the possible positions of a label or even the impos- 
sibility of positioning a label without conflicts. After that, we select one of the newly 
generated children, increase the timestamp and again timestamp all implications. When 
a conflict occurs, the process resumes from the deepest of all nodes left behind, namely, 
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from the nearest decision point with unexplored alternatives. We mark all timestamps 
invalid that correspond to nodes that lie on a deeper level than the decision point. This 
brings the matrix G into its previous state without storing each state separately. Let the 
algorithm return a valid label setting for all rows. Since Procedure 1 ensures that each 
column i contains an interval of length at least a that is free for column labeling we 
can simply label each column and yield a valid label setting. The algorithm is given in 
Algorithm 1, and in the Procedures 1, 2, and 3. 



Algorithm 1 Label (G, R, C, n, m) 

1) timestamp:= 1; 

2) if DRAW.CoNCLUSiONSfG.iJ.C.M.ffdyi'eWs a conflict 

3) return “not solvable”: 

4) timestamp each setting: 

5) let w be the first row that is not yet labeled: 

6) if POSITION_AND_BACKTRACKfw,G../?,C,«,OT,?/mestflm/>l { 

7) label all eolumns that are not yet completely labeled: 

8) return G: } 

9) else 

10) return “not solvable”: 



Procedure 3 Position_and_Backtrack (w, G, R, C, n, m, timestamp) 



1) while there are untested possible positions for label in row w { 

2) local Jimestamp-.=timestamp : =timestamp-\- 1 ; 

3) label row w with in one of these positions: 

4) timestamp each new setting: 

5) if DRAW_CoNCLUSiONsfG.i?.C,«,«i) then { 

6) timestamp each new setting: 

7) if there is a row w that is not yet labeled { 

8) if Position_and_Backtrack(w, G, R, C, n, m) then 

9) return true: 

10 ) } 

11) else return true: 

12 ) } 

13) timestamp each new setting: 

14) mark local Jimestamp invalid: 

15) } 

16) return false: 



We implemented the backtracking algorithm and tested it on over 1 0000 randomly 
generated labeling instances with n and m at most 100. After at most one backtracking 
step per branch the solvability of any instance was decided. The algorithm is construc- 
tive; for each solvable instance a labeling was produced. This makes it reasonable to 
study the worst case run time behavior of the algorithm with backtracking depth one. 
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The algorithm behaves in the worst case when each label is positioned first in all places 
that cause a conflict, before it is positioned in a conflict free place. A row label can be 
positioned in at most m different places. Each time when a label is positioned the Pro- 
cedure 2 Draw -Conclusions is called, which needs at most 0(nm[n + m)) time. 
Thus, the time for positioning a row label is bounded by 0{nm^{n + m)) time. Since n 
rows have to be labeled the backtracking approach with backtracking depth one needs 
at most 0{n^m‘^{n+m)) time. If the assumption of limited backtracking behavior does 
not hold the runtime is exponential. 



Although, Unger and Seibert recently proved the 
A/’P-completeness of the label problem [USOO], 
we now show that a slight generalization, namely 
the labeling of a cylinder shaped downtown, is 
A/^P-hard. The reason is that our reduction could 
be helpful in understanding the complexity of 
the original problem. In addition it is quite intu- 
itive and much shorter than that from Unger and 
Seibert. Instead of labeling an array we now la- 
bel a cylinder consisting of n cyclic rows and m 
columns. Figure 2 shows an example of a cylin- 
der instance. We show that this problem is NV- 
complete by reducing a version of the Three Parti- 
tion problem to it. Our proof is similar to a MV- 
completeness proof of Woeginger [Woe96] about the reconstruction of polyominoes 
from their orthogonal projections. Woeginger showed that the reconstruction of a two- 
dimensional pattern from its two orthogonal projections H and V is A/”P-complete 
when the pattern has to be horizontally and vertically convex. This and other related 
problems, also discussed in [Woe96] show up in the area of discrete tomography. 

Definition 2 (Cylinder Label problem {Z, R, C, n, m)). 

Instance: Let Zn^m be a cylinder consisting of n cyclic rows and m columns. Let Ri 
be the label of the row and let ri be the length of label Ri, 1 <i <n. Let Ci be 
the label of the f column and let Ci be the length of label Ci, 1 < i < m. 
Problem: For each row i set Vi consecutive fields of Zi^. to r,for each column j set Cj 
consecutive fields of Z.j to c. 

Our reduction is done from the following version of the A/”P-complete Three Parti- 
tion problem [GJ79]. 

Problem 1 (Three Partition). 

Instance: Positive integers oi, . . . a^k that are encoded in unary and that fulfill the two 
conditions (i) = k{2B + 1) for some integer B, and {ii){2B + l)/4 < 

Oi < (2B + l)/2forl<f< 3k. 



5 Complexity Status 




Fig.2. Cylinder Label problem 
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Problem: Does there exist a partition of oi, . . . , a^k into k triples such that the ele- 
ments of every triple add up to exaetly 2B -F 1? 



Theorem 3. The Cylinder Label problem is NV -complete. 

Proof: Cylinder Label problem G MV: The Cylinder Label problem is in MV since 
it is easy to check whether a given solution solves the problem or not. 

Transformation: Now let an instance of Tree Partition be given. From this instance we 
construct a Cylinder Label problem consisting of n = k{2B + 2) rows and m = 3k 
columns. The vector r defining the row label length is of the form: 

(m,m — 1, . . . , m — 1, m, m — 1, . . . , m — 1, . . . ) 

{2B 1) -times {2B + l)-times 

Since a row label of length m occupies the whole row, those rows with label length 
m have no free space for column labeling. Therefore the rows with label length m 
subdivide the rows in k blocks, each containing 2B + 1 rows each of which has one 
entry that is free for column labeling when the row is labeled. The vector defining the 
column label length is of the form: 

(oi, 02, . . . , ask) 

The transformation clearly is polynomial. 

The Tree Partition instance has a solution the Cylinder Label instance has a 
solution: 

“=f”: Let {xi,yi, zi), . . . , {xk,yk, be a partition of oi, . . . , a^k into k triples such 
that Xi yi -\- Zi = 2B < i < k. For each i, (xi,yi,Zi) = (a/, Og, a/i), for some 
indices f,g, and h, 1 < i, f,g,h < 3k. We now label the columns /, g, and h among 
themselves in the i-th block of rows. More precisely, in column / we label the fields 
z f,(i-i)( 2 B+ 2)+2 = C, . . . , ^/,(i-i)( 2 B+ 2 )+i+c„^ = c. In column g we label the fields 
Zgyi-i){ 2 B+ 2 )+ 2 +c^j = c, • • • ,^s,(i-i)( 2 B+ 2 )+i+c„^+i+c„g = c. In column wc 
label the fields .^/i,(i-i)( 2 B+ 2 )+ 2 +c„j.+c„g = c, • • • , ^/i,(i-i)( 2 S+ 2 )+i+caj.+c„g+c„,_ = 
c. It then follows that the rows j {2B + 2) + 1 are free for row labeling, for 0<j<k. 
Thus, we can label them with their labels of length 3k = m. All other rows have exactly 
one entry occupied by a column label. Since the rows are cyclic we can label each of 
these rows with a label of length 3k — 1. 

“z=”: Let Zbe a solution of the Cylinder Label instance. Each row contains at most one 
entry that is occupied by a column label. Each column label has length (2i? + l)/4 < 
tti < {2B + l)/2, l<i< 3k. Therefore, exactly three columns are label in the rows 
j{2B + 2) + 2, . . . , (j + 1){2B + 2), for 0 < j < k — 1. Furthermore the label length 
of each triple sums up to 3k and thus partitions oi, . . . , a^k into k triples. Thus solves 
the Three Partition instance. □ 
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6 Solvable Special Cases 

In the following section we derive an 0{nm) time algorithm for the speeial case where 
no label is longer than half of its street length. We think that this case applies especially 
to large downtown maps, where the label length is short in respect to the street length. 
In Seetion 6.2 we solve the label problem when eaeh vertical label is of equal length. In 
many American cities the streets in one orientation (e.g. north-south) are simply ealled 
1st Avenue, 2-nd Avenue, .... These names have all the same label length and thus the 
label problem can be solved with the algorithm in Section 6.2. Another solvable speeial 
case is the following; We are given a quadratie map, where each label has one of two 
label lengths. Such an instance has a solution if and only if no conflict occurred in the 
Preprocessing Procedure 1. Due to space limitations, the length of the algorithm, 
and its proof we refer to the technical report of this paper for this special case [NW99]. 
The algorithms of the last two cases have runtime 0{nm{n + rn)). 



6.1 Half Size 

Let (G, R, C, n, m) be a label problem. In this seetion we study the ease where each 
row label has length at most and eaeh column label has length at most . We 
show that the label problem is solvable in this ease. 



Algorithm 2 Half_Solution (G, R, C, n, rn) 

1) label the rows 1, . . . , leftmost; 

2) label the rows J + 1 < u rightmost; 

3) label the columns 1, . . . , bottommost; 

4) label the columns [^J -E 1, . . . , m topmost; 



Theorem 4. Let (G, R, C, n, m) be a label problem. Let r; < <^nd let c; < [^J. 
Then, Algorithm 2 computes a solution to Problem 1 in 0{nm) time. 

Proof: Take a look at Figure 3. 



6.2 Constant Vertical Street Length 

In this seetion we consider the special case of the label problem (G, R, C, n, m) where 
all column labels have length 1. This problem we denote with (G, R, C, n, m, 1). We 
show that we can decide whether the label problem (G, R, C, n,m,l) is solvable or 
not. We further give a simple algorithm that labels a solvable instance correctly. All 
results of this section are assignable for the eonstant row length ease. 

Theorems (Constant Colnmn Length). Let {G, R,C,n,m,l) be a label problem 
with a = I, 1 < i < n. The instance is solvable if and only if no conflict occurred in 
the Preprocessing Procedure 1. 
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Fig-3. Solution of a typical half size label 
problem according to Algorithm 2. 
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Fig.4. Typical downtown map where the ver- 
tical street names have constant length. 



We assume that I < Otherwise, row contains no fields that are free for 
row labeling. The next lemma states that the preoccupied fields are symmetric to the 
vertical central axis of G. 

Lemma 2. Let {G, R, C, n,m,l) be a successfully preprocessed label problem. After 
the preprocessing eaeh row has the form [aba], where 

Giy — 0, . . . , Gi^a — 0; Gi^a-\-l — X, . . . , Gi^a-\-b — tC, G — 0; • - • ; Gi^rn — 0 
for x = r or x = c, m>b>0 and 2a + b = m. 

Proof: Initially we have Gij = 0 for 1 < i < n, 1 < j < m. Executing Rule 3.2 on 
each row i with length n > ^ yields Gi^rn-n+i = r, ■ ■ ■ , Gi^n = r. Thus, all r-entries 
of G are symmetric to the vertical mid axis of G. Remember that each column has label 
length 1. Therefore, executing Rule 3.2 on each column i yields that for each entry Gij 
that is set to c the fields Gi,j+i = c, , Gi^m-j+i = c, if 1 < ^; and Gi^m-j+i = 
c, ... , Gij-i = c, if Y < i < m. Therefore, all c-entries of G are symmetric to 
the central vertical axis of G. Thus, until now each row i has the form [aba], where 
Giy — 0, . . . , Gi^a — 0) Gi^a-\-l — X, . . . , Gi^a-\-b — tC, G i^a-\-b-\-l — 0? • • • ; — 0 

for x,y G {r, c},b > 0,2a + b = m and 1 < i < n. Assume that the repeated execution 
of Rule 3.2 on row i of form [aba] and x = c does change an entry of Gi^.. In this case 
a<Ti and it follows that the instance is not solvable. Therefore, the repeated execution 
of Rule 3 .2 on a column can not change the instance and the lemma follows. □ 



Lemma 3. Let {G, R, C, n,m,l) be a successfully preprocessed label problem. Then 
Algorithm 3 computes a feasible solution to {G, R, C, n, m, 1) in 0{nm{n + m)) time. 

Proof: Since (G, R, C, n,m,l) is preprocessed successively it follows that each column 
contains an interval of length at least I that is free for column labeling. Assume that after 
processing steps 2-3 there exists a row i not containing an interval of length that is 
free for row labeling. We make a case distinction according to the length of Ri : 
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Algorithm 3 CONSTANT COLUMN Length (G, R, C, n, m, 1) 

1) if Preprocessing(G,-R, (7,71,771) { 

2) label the columns 1, . . . , — 1 bottommost: 

3) label the columns [ , ■ ■ ■ , Tn, topmost; 

4) label the rows 1, ... ,n in the free space: 

5) } 



Case r'i> [ know that the fields Gi^rn-n+i = t, . . . , Gi^^ = f were set in the 
preprocessing. Furthermore, Lemma 2 yields that no other entry of Gi^. was set to c 
in the preprocessing. Therefore, each column G.j with 1 < j < m — r; + 1 or r: + 
1 < j < m contains either one interval of length at least 21 that is free for column 
labeling or two intervals each of length at least I that is free for column labeling. 
From the symmetry of the label problem and since the column labels of the columns 
j with 1 < .7 < m — Ti + 1 are labeled bottom most and the labels j with + 1 < 
j < m are labeled top most it follows that either the fields Gij, . . . , Gi^rn-rt+i are 
free for row labeling or the fields Gi^n+i, ■ ■ ■ , Gi^m- Thus, G;^. contains an interval 
of length Ti that is free for row labeling. Contradiction. 

Case Ti < [ • Lemma 2 yields that this row has the form [aba] with G;p = 0, . . . , 

Gi^a — 0? Gi^a-\-l — (-;••• 5 Gi^a-\-b — C7 0? • • • 5 Gi^m. — 0, 6 0 and 

2a + b = m. Since the instance is solvable it follows that a > r;. With the same 
arguments as above it follows that either the fields G:p , . . . , Gi^n are free for row 
labeling or the fields Gi^m-riPi , ■ ■ ■ , Gi^m are free for row labeling. Contradiction. 

The running time is dominated by the preprocessing and thus 0{nm{n + m)). □ 

See Figure 5 and 6 for an example. Figure 4 shows a typical downtown city map in 
which all vertical streets have the same length. 




Fig-5. Matrix of a constant column length 
problem after the successful preprocessing. En- 
tries that are set in the preprocessing are col- 
ored blaek and gray. 




Fig.6. Solution of the label problem of the left 
figure. 
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Abstract. In this paper, we analyze algorithms for the online dial-a- 
ride problem with request sets that fulfill a certain worst-case restriction: 
roughly speaking, a set of requests for the online dial-a-ride problem is 
reasonable if the requests that come up in a sufficiently large time period 
can be served in a time period of at most the same length. This new 
notion is a stability criterion implying that the system is not overloaded. 
The new concept is used to analyze the online dial-a-ride problem for 
the minimization of the maximal resp. average flow time. Under reason- 
able load it is possible to distinguish the performance of two particular 
algorithms for this problem, which seems to be impossible by means of 
classical competitive analysis. 



1 Introduction 

It is a standard assumption in mathematics, computer science, and operations 
research that problem data are given. However, many aspects of life are online. 
Decisions have to be made without knowing future events relevant for the current 
choice. Online problems, such as vehicle routing and control, management of call 
centers, paging and caching in computer systems, foreign exchange and stock 
trading, had been around for a long time, but no theoretical framework existed 
for the analysis of online problems and algorithms. 

Meanwhile, competitive analysis has become the standard tool to analyze 
online-algorithms [4,6]. Often the online algorithm is supposed to serve the re- 
quests one at a time, where a next request becomes known when the current 
request has been served. However, in cases where the requests arrive at certain 
points in time this model is not sufficient. In [3,5] each request in the request 
sequence has a release time. The sequence is assumed to be in non-decreasing 
order of release times. This model is sometimes referred to as the real time 
model. A similar approach was used in [1] to investigate the online dial-a-ride 
problem — OlDarp for short — which is the example for the new concept in this 
paper. 

Since in the real time model the release of a new request is triggered by a 
point in time rather than a decision of the online algorithm we essentially do not 
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need a total order on the set of requests. Therefore, for the sake of convenience, 
we will speak of request sets rather than request sequences. 

In the problem OlDarp objects are to be transported between points in a 
given metric space X with the property that for every pair of points {x, y) E X 
there is a path p : [0, 1] — > X in X with p(0) = x and p(l) = y of length d{x, y). 
An important special case occurs when X is induced by a weighted graph. 

A request consists of the objects to be transported and the corresponding 
source and target vertex of the transportation request. The requests arrive online 
and must be handled by a server which starts and ends its work at a designated 
origin. The server picks up and drops objects at their starts and destinations. It 
is assumed that neither the release time of the last request nor the number of 
requests is known in advance. 

A feasible solution to an instance of the OlDarp is a schedule of moves (i.e., 
a sequence of consecutive moves in X together with their starting times) in X 
so that every request is served and that no request is picked up before its release 
time. The goal of OlDarp is to find a feasible solution with “minimal cost”, 
where the notion of “cost” depends on the objective function used. The focus of 
this paper is the investigation of the notoriously difficult task to minimize the 
maximal or average flow time online. 

Recall that an online-algorithm A is called c-competitive if there exists a 
constant c such that for any request set a (or request sequence a if we are 
concerned with the classical online model) the inequality A((t) < c-OPT (a) holds. 
Here, X(a) denotes the objective function value of the solution produced by 
algorithm X on input a and OPT denotes an optimal offline algorithm. Sometimes 
we are dealing with various objectives at the same time. We then indicate the 
objective obj in the superscript, as in X°^-^((t). 

Competitive analysis of OlDarp provided the following (see [1]): 

— There are competitive algorithms (IGNORE and REPLAN, definitions see 
below) for the goal of minimizing the total completion time of the schedule. 

— For the task of minimizing the maximal (average) waiting time or the maxi- 
mal ( average ) flow time there can be no algorithm with constant competitive 
ratio. In particular, the algorithms IGNORE and REPLAN have an unbounded 
competitive ratio. 

We do not claim originality for the actual online-algorithms IGNORE and RE- 
PLAN; instead we show a new method for their analysis. As the reader will see in 
the definitions, both REPLAN and IGNORE are straight-forward online strategies 
based on the ability to solve the offline version of the problem to optimality or a 
constant-factor approximation thereof (with respect to the minimization of the 
total completion time). 

The first — to the best of our knowledge — occurrence of the strategy IGNORE 
can be found in the paper by Shmoys, Wein, and Williamson [13]: They show a 
fairly general result about obtaining competitive algorithms for minimizing the 
total completion time in machine scheduling problems when the jobs arrive over 
time: If there is a p-approximation algorithm for the offline version, then this 
implies the existence of a 2/O-competitive algorithm for the online- version, which 
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is essentially the IGNORE strategy. The results from [13] show that IGNORE- 
type strategies are 2-competitive for a number of online-scheduling problems. 
The strategy REPLAN is probably folklore; it can be found also under different 
names like REOPT or OPTIMAL. 

It should be noted that the corresponding offline-problems with release times 
(where all requests are known at the start of the algorithm) are NP-hard to solve 
for the objective functions of minimizing the average or maximal flow time — it is 
even NP-hard to find a solution within a constant factor from the optimum [11]. 
The offline problem without release times of minimizing the total completion 
time is polynomially solvable on special graph classes but NP-hard in general 
[8,2,7,10]. 

If we are considering a continuously operating system with continuously ar- 
riving requests (i.e., the request set may be infinite) then the total completion 
time is meaningless. Bottom-line: in this case, the existing positive results cannot 
be applied and the negative results tell us that we cannot hope for performance 
guarantees that may be relevant in practice, such as bounds for the maximal 
or average flow time. In particular, the two algorithms IGNORE and REPLAN 
cannot be distinguished by classical competitive analysis because it is easy to see 
that no online-algorithm can have a constant competitiveness ratio with respect 
to minimizing the maximal or average flow time. 

The point here is that we do not know any notion from the literature to 
describe what a particular set of requests should look like in order to allow for a 
continuously operating system. In queuing theory this is usually modelled by a 
stability assumption: the rate of incoming requests is at most the rate of requests 
served. To the best of our knowledge, so far there has been nothing similar in the 
existing theory of discrete online-algorithms. Since in many instances we have no 
exploitable information about the distributions of requests we want to develop a 
worst-case model rather than a stochastic model for stability of a continuously 
operating system. 

Our idea is to introduce the notion of A-reasonable request sets. A set of 
requests is Z\-reasonable if — roughly speaking — requests released during a period 
of time 6 > A can be served in time at most 6 . A set of requests R is reasonable 
if there exists a < oo such that R is Z\-reasonable. That means, for non- 
reasonable request sets we find arbitrarily large periods of time where requests 
are released faster than they can be served — even if the server has the optimal 
offline schedule. When a system has only to cope with reasonable request sets, we 
call this situation reasonable load. Section 3 is devoted to the exact mathematical 
setting of this idea. 

We now present our main result on the OlDarp under Z\-reasonable which 
we prove in Sects. 4 and 5. 

Theorem 1. For the OlDarp under A-reasonable load, IGNORE yields a max- 
imal and an average flow time of at most 2A, whereas the maximal and the 
average flow time of REPLAN are unbounded. 

The algorithms IGNORE and REPLAN have to solve a number of offline in- 
stances of OlDarp, minimizing the total completion time, which is in general 
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NP-hard, as we already remarked. We will show how we can derive results for 
IGNORE when using an approximate algorithm for solving offline instances of 
OlDarp (for approximatation algorithms for offline instances of OlDarp, re- 
fer to [8,2,7,10]). For this we refine the notion of reasonable request sets again, 
introducing a second parameter that tells us, how “fault tolerant” the request 
set is. In other words, the second parameter tells us, how “good” the algorithm 
has to be to show stable behavior. Again, roughly speaking, a set of requests is 
(Z\,p)-reasonable if requests released during a period of time 6 > A can be served 
in time at most djp. If p = 1, we get the notion of Z\-reasonable as described 
above. For p > 1, the algorithm is allowed to work “sloppily” (e.g., employ ap- 
proximation algorithms) or have break-downs to an extent measured by p and 
still show a stable behavior. 

Note that our performance guarantee is with respect to the “reasonableness” 
A of the input set — not with respect to an optimal offline solution. One might 
ask whether IGNORE is competitive with respect to minimizing the maximal or 
average flow time. This follows trivially from our main result if the length of a 
single request is bounded from below; we leave it as an exercise for the reader to 
show that without this assumption there can be no competitive online algorithm 
for these objective functions — even under reasonable load. 

The algorithms under investigation compute offline locally optimal schedules 
with respect to the minimization of the total completion time in order to glob- 
ally minimize the maximal or average flow times. This is of practical relevance 
because minimizing the total completion time offline is easier than minimizing 
the maximal or average flow time offline (see [11]). It is an open problem whether 
locally optimal schedules minimizing the maximal or average flow time yield bet- 
ter results. However, then the locally optimal schedules would be much harder 
to compute. Thus, such algorithms would not be feasible in practice. 

2 Preliminaries 

Let us first sketch the problem under consideration. We are given a metric space 
{X,d). Moreover, there is a special vertex o £ X (the origin). Requests are 
triples r = (t,a,b), where a is the start point of a transportation task, b its 
end point, and t its release time, which is — in this context — the time where r 
becomes known. A transportation move is a quadruple m = (t,a,b,R), where a 
is the starting point and b the end point, and t the starting time, while R is the 
set (possibly empty) of requests carried by the move. The arrival time of a move 
is the sum of its starting time and d{a, b). A (closed) transportation schedule is 
a sequence {mi, m 2 , ■ ■ ■) of transportation moves such that 

— the first move starts in the origin o; 

— the starting point of mi is the end point of mi-p, 

— the starting time of mi carrying R is no earlier than the maximum of the 
arrival time of mi and the release times of all requests in R. 

— the last move ends in the origin o. 
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An online-algorithm for OlDarp has to move a server in X so as to fulfill all 
released transportation tasks without preemption (i.e., once an object has been 
picked up it is not allowed to be dropped at any other place than its destination), 
while it does not know about requests that come up in the future. In order to plan 
the work of the server, the online-algorithm may maintain a preliminary (closed) 
transportation schedule for all known requests, according to which it moves the 
server. A posteriori, the moves of the server induce a complete transportation 
schedule that may be compared to an offline transportation schedule that is 
optimal with respect to some objective function (competitive analysis). 

We are concerned with the following objective functions: 

— The total completetion time (or makespan) A'^°™^((t) of the solution pro- 
duced by algorithm A on a request set a is the time need by algorithm A to 
serve all requests in a. 

— The maximal resp. average flow time a™“^°“'(it) resp. A^°“((t) is the max- 
imal resp. average of the differences between the completion times produced 
by A and the release times of the requests in a. 

We start with some useful notation. 

Definition 1. The release time of a reguest r is denoted by t{r). 

The offline version of r = {t, a, b) is the reguest 

^offirne (Q,a,6). 

The offline version of R is the request set 

^offline ^^offUne . ^ ^ ^ 

An important characteristic of a request set with respect to system load 
considerations is the time period in which it is released. 

Definition 2. Let R be a finite request set for OlDarp. The release span S{R) 
of R is defined as 

S{R) := maxt(r) — mint(r). 

r^R r^R 

Provably good algorithms exist for the total completion time and the weighted 
sum of completion times. How can we make use of these algorithms in order to 
get performance guarantees for minimizing the maximal (average) waiting (flow) 
times? We suggest a way of characterizing request sets which we want to consider 
“reasonable” . 

3 Reasonable Load 

In a continuously operating system we wish to guarantee that work can be 
accomplished at least as fast as it is presented. In the following we propose a 
mathematical set-up which models this idea in a worst-case fashion. Since we are 
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always working on finite subsets of the whole request set the request set itself 
may be infinite, modeling a continuously operating system. 

We start by relating the release spans of finite subsets of a request set to the 
time we need to fulfill the requests. 

Definition 3. Let R be a request set for the OlDarp. A weakly monotone 
funetion 

J ]R ^ R, 

is a load bound on R if for any h G IR a,nd any finite subset S of R with S{S) < S 
the eompletion time of the optimum schedule for the offline 

version of S is at most f{S). In formula: 

Qpjcomp^gofft-me^ < /(<f). 

Remark 1. If the whole request set R is finite then there is always the trivial 
load bound given by the total completion time of R. For every load bound / we 
may set /(O) to be the maximum completion time we need for a single request, 
and nothing better can be achieved. 

A stable situation would be characterized by a load bound equal to the 
identity on IR. In that case we would never get more work to do than we can 
accomplish. If it has a load bound equal to a function id/ p, where id is the 
identity and where p > 0, then p measures the tolerance of the request set: An 
algorithm that is by a factor p worse then optimal will still accomplish all the 
work that it gets. However, we cannot expect that the identity (or any linear 
function) is a load bound for OlDarp because of the following observation: a 
request set consisting of one single request has a release span of 0 whereas in 
general it takes non- zero time to serve this request. In the following definition 
we introduce a parameter describing how far a request set is from being load- 
bounded by the identity. 

Definition 4. A load bound f is (Z\,p)-reasonable for some A, p E IR if 

pf{5) < 5 for all S > A 

A request set R is (Z\,p)-reasonable if it has a (A,p)-reasonable load bound. For 
p = 1, we say that the request set is A-reasonable. 

In other words, a load bound is (A,p)-reasonable, if it is bounded from above 
by 1/p • id{x) for all a; > and by the constant function with value 1/pA 
otherwise. 

Remark 2. If A is sufficiently small so that all request sets consisting of two or 
more requests have a release span larger than A then the first-come-first-serve 
strategy is good enough to ensure that there are never more than two unserved 
requests in the system. Hence, the request set does not require scheduling the 
requests in order to provide for a stable system. (By “stable” we mean that the 
number of unserved requests in the system does not become arbitrarily large.) 
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In a sense, is a measure for the combinatorial difficulty of the request set R. 
Thus, it is natural to ask for performance guarantees for algorithms in terms of 
this parameter. This is done for the algorithm IGNORE in the next section. 

4 Bounds for the Flow Times of IGNORE 

We are now in a position to prove bounds for the maximal resp. average flow 
time in the OlDarp for algorithm IGNORE stated in Theorem 1. We start by 
recalling the algorithm IGNORE from [1] 

Definition 5 (Algorithm IGNORE). Algorithm IGNORE works with an inter- 
nal buffer. It may assume the following states (initially it is IDLE): 

IDLE Wait for the next point in time when requests become available. Goto 
PLAN. 

BUSY While the current schedule is in work store the upcoming requests in a 
buffer (“ignore them”). Goto IDLE if the buffer is empty else goto PLAN. 
PLAN Produce a preliminary transportation schedule for all currently avail- 
able requests R (taken from the buffer) with (approximately) minimal total 
completion time for (Note: This yields a feasi- 

ble transportation schedule for R because all requests in R are immediately 
available.) Goto BUSY. 

We assume that IGNORE solves offline instances of OlDarp employing a 
/9-approximation algorithm. Recall that a /9- approximation algorithm is a poly- 
nomial algorithm that always finds a solution that is at most p times the optimum 
objective value. 

Let us consider the intervals in which IGNORE organizes its work in more 
detail. The algorithm IGNORE induces a dissection of the time axis IR in the 
following way: We can assume, w.l.o.g., that the first set of requests arrives at 
time 0. Let ho = 0, i.e., the point in time where the first set of requests is 
released that are processed by IGNORE in its first schedule. For f > 0 let h/ be 
the duration of the time period the server is working on the requests that have 
been ignored during the last h/_i time units. Then the time axis is split into the 
intervals 



[ho — 0 , ho], (ho, hi], (hi, hi + 62 ], (hi + h2, hi + h2 + hs] 



Let us denote these intervals by Jo, /i, / 2 , Moreover, let Ri be the set of those 

requests that come up in R. Clearly, the complete set of requests R is the disjoint 
union of all the Ri. 

At the end of each interval R we solve an offline problem: all requests to 
be scheduled are already available. The work on the computed schedule starts 
immediately (at the end of interval R) and is done h^+i time units later (at the 
end of interval R^i). On the other hand, the time we need to serve the schedule 
is not more than p times the optimal completion time of Ri ^ jn other words: 
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Lemma 1. For all i > 0 we have 

Si+i < p ■ 

Let us now prove the first statement of Theorem 1 in a slightly more general 
version. 

Theorem 2. Let > 0 and p> 1- For all instances of OlDarp with {A,p)~ 
reasonable request sets, IGNORE employing a p-approximate algorithm for solving 
offline instances of OlDarp yields a maximal flow time of no more than 2 A. 

Proof. Let r be an arbitrary request in Ri for some i > 0, i.e., r is released in R. 
By construction, the schedule containing r is finished at the end of interval /i+i, 
i.e., at most Si + hi+i time units later than r was released. Thus, for alH > 0 we 
get that 

IGNORE™“^°“(i?i) <6iF Si+i. 

If we can show that Si < A for alH > 0 then we are done. To this end, let 
/ : IR ^ IR be a (Z\, p)-reasonable load bound for R. Then < 

f{Si) because S{Ri) < Si. 

By Lemma 1, we get for alH > 0 

<^i+i < < pf{SR < max{h„Zi}. 



Using ho = 0 the claim now follows by induction on i. 

The statement of Theorem 1 concerning the average flow time of IGNORE 
follows from the fact that the average is never larger then the maximum. 

Corollary 1. Let > 0. For all A-reasonable request sets algorithm IGNORE 
yields a average flow time no more than 2A. 

5 A Disastrous Example for REPLAN 

We first recall the strategy of algorithm REPLAN for the OlDarp. Whenever 
a new request becomes available, REPLAN computes a preliminary transporta- 
tion schedule for the set R of all available requests by solving the problem of 
minimizing the total completion time of . 

Then it moves the server according to that schedule until a new request 
arrives or the schedule is done. In the sequel, we provide an instance of OlDarp 
and a Z\-reasonable request set R such that the maximal and the average flow 
time REPLAN™“-^°“(i?) is unbounded, thereby proving the remaining assertions 
of Theorem 1. 

Theorem 3. There is an instance of OlDarp under reasonable load such that 
the maximal and the average flow time of REPLAN is unbounded. 
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Fig. 1. A sketch of a (2|£)-reasonable instance of OlDarp {i = 18e). 



Proof. In Fig. 1 there is a sketch of an instance for the OlDarp. The metric 
space is a path on four nodes a, b, c, d; the length of the path is £, the distances 
are d(a, b) = d{c, d) = e, and hence d{b, c) = i — 2e. At time 0 a request from a 
to d is issued; at time 3/2£ — e, the remaining requests periodically come in pairs 
from b to a and from c to d, resp. The time distance between them is £ — 2e. 

We show that for £ = 18e the request set R indicated in the picture is 2|f- 
reasonable. Indeed: it is easy to see that the first request from a to d does not 
influence reasonableness. Consider an arbitrary set Rk of k adjacent pairs of 
requests from 6 to a resp. from c to d. Then the release span S{Rk) of Rk is 



In order to And the smallest paramter A for which the request set Rk is 
Z\-reasonable we solve for the integer k — 1 and get 



Hence, we can set A to 



6{Rk) = {k-l){£-2e). 

The offline version of Rk can be served in time 

Qpjcon,p^j^^offltne-j = 2£ + {k ~ 1) ■ 4fc. 



Z\ := OPT“™^’(i?4°^“") = 2|£. 



Now we define 




zl for d < A, 
S otherwise. 
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Fig. 2. The track of the REPLAN-server. Because a new pair of requests is issued 
exactly when the server is still closer to the requests at the top all the requests 
at the bottom will be postponed in an optimal preliminary schedule. Thus, the 
server always returns to the top when a new pair of requests arrives. 



By construction, / is a load bound for R^. Because the time gap after which 
a new pair of requests occurs is certainly larger than the additional time we need 
to serve it (offline), / is also a load bound for R. Thus, R is Z\-reasonable, as 
desired. 

Now; how does REPLAN perform on this instance? In Fig. 2 we see the track 
of the server following the preliminary schedules produced by REPLAN on the 
request set R. 

The maximal flow time of REPLAN on this instance is realized by the flow 
time of the request (3/2f — e, b, a), which is unbounded. 

Moreover, since all requests from b to a are postponed after serving all the 
requests from c to d we get that REPLAN produces an unbounded average flow 
time as well. 

In Fig. 3 we show the track of the server under the control of the IGNORE- 
algorithm. After an initial inefficient phase the server ends up in a stable oper- 
ating mode. This example also shows that the analysis of IGNORE in Sect. 4 is 
sharp. 

6 Reasonable Load as a General Framework 

We introduced the new concept of reasonable request sets, using as example the 
problem OlDarp. However, the concept can be applied to any combinatorial 



The Online Dial-a-Ride Problem under Reasonable Load 



135 




online problem with (possibly infinte) sets of time stamped requests, such as on- 
line scheduling, e.g., as described by Sgall [12], or the Online Traveling Salesman 
Problem, studied by Ausiello et al. [3]. 

The algorithms IGNORE and REPLAN represent general “online paradigms” 
which can be used for any online problem with time-stamped requests. We notice 
that the proof of the result that the average and maximal flow and waiting times 
of IGNORE are bounded by 2A has not explicitly drawn on any specific property 
of OlDarp — this result holds for all combinatorial online problems with requests 
given by their release times. 

The proof that the maximal flow and waiting time of a Z\-reasonable request 
set is unbounded for REPLAN is equally applicable to the Online Traveling Sales- 
man Problem by Ausiello et.al. [3]. We expect that the same is true for any “suf- 
ficiently difficult” online problem with release times — for very simple problems, 
such as OlDarp on a zero dimensional space, the result trivially does not hold. 

7 Conclusion 

We have introduced the mathematical notion A-reasonable describing the com- 
binatorial difficulty of a possibly infinite request set for OlDarp. For reason- 
able request sets we have given bounds on the maximal resp. average flow time 
of algorithm IGNORE for OlDarp; in contrast to this, there are instances of 
OlDarp where algorithm REPLAN yields an unbounded maximal and average 
flow time. One key property of our results is that they can be applied in contin- 
uously working systems. Computer simulations have meanwhile supported the 
theoretical results in the sense that algorithm IGNORE does not delay individual 
requests for an arbitraryly long period of time, whereas REPLAN has a tendency 
to do so [9]. 
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While the notion of Z\-reasonable is applicable to minimizing maximal flow 

time, it would be of interest to investigate an average analogue in order to prove 

non-trivial bounds for the average flow times. 
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Abstract. In the online traveling salesman problem requests for visits 
to cities (points in a metric space) arrive online while the salesman is 
traveling. The salesman moves at no more than unit speed and starts and 
ends his work at a designated origin. The objective is to find a routing 
for the salesman which finishes as early as possible. 

We consider the online traveling salesman problem when restricted to the 
non-negative part of the real line. We show that a very natural strategy 
is 3/2-competitive which matches our lower bound. The main contribu- 
tion of the paper is the presentation of a “fair adversary” , as an alterna- 
tive to the omnipotent adversary used in competitive analysis for online 
routing problems. The fair adversary is required to remain inside the con- 
vex hull of the requests released so far. We show that on IRj" algorithms 
can achieve a strictly better competitive ratio against a fair adversary 
than against a conventional adversary. Specifically, we present an algo- 
rithm against a fair adversary with competitive ratio (IT vT7)/4 ~ 1.28 
and provide a matching lower bound. We also show competitiveness re- 
sults for a special class of algorithms (called diligent algorithms) that 
do not allow waiting time for the server as long as there are requests 
unserved. 



1 Introduction 

The traveling salesman problem is a well studied problem in combinatorial op- 
timization. In the classical setting, one assumes that the complete input of an 
instance is available for an algorithm to compute a solution. In many cases this 
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offline optimization model does not reflect the real-world situation appropri- 
ately. In many applications not all requests for points to be visited are known 
in advance. Decisions have to be made online without knowledge about future 
requests. 

Online algorithms are tailored to cope with such situations. Whereas offline 
algorithms work on the complete input sequence, online algorithms only see the 
requests released so far and thus, in planning a route, have to account for future 
requests that may or may not arise at a later time. A common way to evaluate 
the quality of online algorithms is competitive analysis [3,5]. 

In this paper we consider the following online variant of the traveling sales- 
man problem (called Oltsp in the sequel) which was introduced in [2]. Cities 
(requests) arrive online over time while the salesman is traveling. The requests 
are to be handled by a salesman-server that starts and ends his work at a des- 
ignated origin. The objective is to And a routing for the server which finishes as 
early as possible (in scheduling theory this goal is usually referred to as mini- 
mizing the makespan). In this model it is feasible for the server to wait at the 
cost of time that elapses. Decisions are revocable, as long as they have not been 
executed. Only history is irrevocable. 



Previous Work. Ausiello et al. [2] present a 2-competitive algorithm for Oltsp 
which works in general metric spaces. The authors also show that for general 
metric spaces no deterministic algorithm can be c-competitive with c < 2. For 
the special case that the metric space is IR, the real line, they give a lower bound 
of (9 + vT 7)/8 « 1.64 and a 7/4-competitive algorithm. Just very recently Lip- 
mann [7] devised an algorithm that is best possible for this case with competitive 
ratio equal to the before mentioned lower bound. 



Our Contribution. In this paper the effect of restricting the class of algorithms 
allowed and restricting the power of the adversary in the competitive analysis 
is studied. We introduce and analyze a new class of online algorithms which we 
call diligent algorithms. Roughly speaking, a diligent algorithm never sits idle 
while there is work to do. A precise definition is presented in Section 3 where we 
also show that in general diligent algorithms are strictly weaker than algorithms 
that allow waiting time. In particular we show that no diligent algorithm can 
achieve a competitive ratio lower than 7/4 for the Oltsp on the real line. The 
7 / 4-competitive algorithm in [2] is in fact a diligent algorithm and therefore best 
possible within this restricted class of algorithms. 

We then concentrate on the special case of Oltsp when the underlying met- 
ric space is IRj/, the non- negative part of the real line. In Section 4 we show that 
an extremely simple and natural diligent strategy is 3 /2-competitive and that 
this result is best possible for (diligent and non-diligent) deterministic algorithms 
on IR(/. The main contribution is contained in Section 5. Here we deal with an 
objection frequently encountered against competitive analysis concerning the un- 
realistic power of the adversary against which performance is measured. Indeed, 
in the Oltsp on the real line the before mentioned 7/4-competitive algorithm 
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reaches its competitive ratio against an adversary that moves away from the 
previously released requests without giving any information to the online al- 
gorithm. We introduce a fair adversary that is in a natural way restricted in 
the context of the online traveling salesman problem studied here. It should be 
seen as a more reasonable adversary model. A fair adversary always keeps its 
server within the convex hull of the requests released so far. We show that this 
adversary model indeed allows for lower competitive ratios. For instance, the 
above mentioned 3/2-competitive diligent strategy against the conventional ad- 
versary is 4/3-competitive against the fair adversary. This result is best possible 
for diligent algorithms against a fair adversary. 





Diligent Algorithms 


General Algorithms 


General Adversary 


LB = UB = 3/2 


LB = UB = 3/2 


Fair Adversary 


LB = UB = 4/3 


LB = UB = (H-V17)/4 



Table 1. Overview of the lower bound (LB) and upper bound (UB) results for 
the competitive ratio of deterministic algorithms for Oltsp on IRq^ in this paper. 



We also present a non-diligent algorithm with competitive ratio (1+ vT7)/4 « 
1.28 <4/3 competing against the fair adversary. Our result is the first one that 
shows that waiting is actually advantageous in the Oltsp. The before mentioned 
algorithm in [7] also uses waiting, but became known after the one presented in 
this paper and has not been officially published yet. Such results are known al- 
ready for online scheduling problems (see e.g. [6,4,8]) and, again very recently, 
also for an online dial-a-ride problem [1]. Our competitiveness result is com- 
plemented by a matching lower bound on the competitive ratio of algorithms 
against the fair adversary. Table 1 summarizes our results for Oltsp on IR(/. 



2 Preliminaries 

An instance of the Online Traveling Salesman Problem (Oltsp) consists of a 
metric space M = (X, d) with a distinguished origin o E M and a sequence a = 
(Ti , . . . , am of requests. In this paper we are mainly concerned with the special 
case that M is IR(/, the non- negative part of the real line endowed with the 
Euclidean metric, i.e., X = IRq = {a;GlR:a;>0} and d{x, y) = ja; — y|. A 
server is located at the origin o at time 0 and can move at most at unit speed. 

Each request is a pair ai = {ti,Xi), where U G IR is the time at which 
request Uj is released (becomes known), and G X is the point in the metric 
space requested to be visited. We assume that the sequence a ^ o"!, ■ ■ ■ ,am of 
requests is given in order of non-decreasing release times. For a real number t 
we denote by a<t and a<^t the subsequence of requests in a released up to time t 
and strictly before time t, respectively. 
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It is assumed that the online algorithm does neither have information about 
the time when the last request is released nor about the total number of requests. 

An online algorithm for Oltsp must determine the behavior of the server 
at a certain moment t of time as a function of all the requests in a<t (and 
the current time t). In contrast, the offline algorithm has information about all 
requests in the whole sequence a already at time 0. A feasible online/offline 
solution is a route for the server which serves all requested points, where each 
request is served not earlier than the time it is released, and which starts and 
ends in the origin o. 

The objective in the Oltsp is to minimize the total completion time (also 
called the “makespan” in scheduling) of the server, that is, the time when the 
server has served all requests and returned to the origin. 

Let alg((t) denote the completion time of the server moved by algorithm alg 
on the sequence a of requests. We use opt to denote the optimal offline algo- 
rithm. An online algorithm ALG for Oltsp is c-competitive, if there exists a con- 
stant c such that for every request sequence a the inequality ALG((t) < c-OPt(cj) 
holds. 

3 Diligent Algorithms 

In this section we introduce a particular class of algorithms for Oltsp which we 
call diligent algorithms. Intuitively, a diligent algorithm should never sit and wait 
when it could serve yet unserved requests. A diligent server should also move 
towards work that has to be done directly without any detours. To translate this 
intuition into a rigorous definition some care has to be taken. 

Definition 1 (Diligent Algorithm). An algorithm alg for Oltsp is called 
diligent, if it satisfies the following conditions: 

1. If there are still unserved requests, then the direction of the server operated 
by ALG changes only if a new request becomes known, or the server is either 
in the origin or at a request that has just been served. 

2. At any time when there are yet unserved requests, the server operated by alg 
either moves towards an unserved request or the origin at maximum (i.e. 
unit) speed. The latter case is only allowed if the server operated by alg is 
not yet in the origin. 

We emphasize that a diligent algorithm is allowed to move its server towards 
an unserved request and change his direction towards another unserved request 
or to the origin at the moment a new request becomes known. 

Lemma 1. No diligent online algorithm for Oltsp on the real line IR has com- 
petitive ratio of less than 7/4. 

Proof. Suppose that alg is a diligent algorithm for Oltsp on the real line. 
Consider the following adversarial input sequence. At time 0 two requests ci = 
(0, 1/2) and ct 2 = (1/2,0) are released. There will be no further requests before 
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time 1. Thus, by the diligence of the algorithm the server will be at the origin 
at time 1. 

At time 1 two new requests at points 1 and —1, respectively, are released. 
Since the algorithm is diligent, starting at time 1 it must move its server to 
one of these requests at maximum, i.e., unit, speed. Without loss of generality 
assume that this is the request at 1. alg’s server will reach this point at time 2. 
Starting at time 2, ALG will have to move its server either directly towards the 
unserved request at —1 or towards the origin, which essentially gives the same 
movement and implies that the server is at the origin at time 3. At that time, 
the adversary issues another request at 1. Thus, alg’s server will still need at 
least 4 time units to serve —1 and 1 and return at the origin. Therefore, he will 
not be able to complete its work before time 7. 

The adversary handles the sequence by first serving the request at —1, then 
the two requests at 1 and finally returns to the origin at time 4, yielding the 
desired result. □ 

This lower bound shows that the 7 / 4-competitive algorithm presented in [2] , 
which is in fact a diligent algorithm is best possible within the class of diligent 
algorithms for the Oltsp on the real line. 

4 The Oltsp on the Non-negative Part of the Real Line 

We first consider Oltsp on IR(| when the offline adversary is the conventional 
(omnipotent) opponent. 

Theorem 1. No deterministic algorithm for Oltsp on Mq has a eompetitive 
ratio of less than 3/2. 

Proof. At time 0 the request ui = (0, 1) is released. Let T be the time that the 
server operated by alg has served the request ai and returned to the origin 0. 
If T > 3, then no further request is released and alg is no better than 3/2- 
competitive since opt((Ti) = 2. Thus, assume that T < 3. In this case the 
adversary releases a new request ct 2 = (T, T). Clearly, opt(cji, CJ 2 ) = 2T. On the 
other hand ALG((Ti,(T 2 ) > 3T, yielding a competitive ratio of 3/2. □ 

The following extremely simple strategy achieves a competitive ratio that 
matches this lower bound (as we will show below): 

Strategy mrin ( “Move-Right-If-Necessary” ) If a new request is released and 
the request is to the right of the current position of the server operated 
by MRIN, then the MRiN-server starts to move right at full speed. The server 
continues to move right as long as there are yet unserved requests to the 
right of the server. If there are no more unserved requests to the right, then 
the server moves towards the origin 0 at full speed. 

It is easy to verify that Algorithm MRIN is in fact a diligent algorithm. The 
following theorem shows that the strategy has a best possible competitive ratio 
for Oltsp on IR,/. 
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Theorem 2. Strategy mrin is a diligent 3/2-eompetitive algorithm for the 
Oltsp on the non-negative part Mq of the real line. 



Proof. We show the theorem by induction on the number of requests in the 
sequence a. It clearly holds if a contains at most one request. The induction 
hypothesis is that it holds for any sequence of m — 1 requests. 

Suppose that request a^n = {t, x) is the last request of a = ai, . . . , Cm-i, cTm- 
If t = 0, then MRIN is obviously 3/2-competitive, so we will assume that t > 0. 
Let / be the position of the request unserved by the MRiN-server at time t 
(excluding am), which is furthest away from the origin. 

In case x < f , mrin’s cost for serving a is equal to the cost for serving the 
sequence consisting of the first m — 1 requests of a. Since new requests can never 
decrease the optimal offline cost, the induction hypothesis implies the theorem. 

Now assume that f < x. Thus, at time t the request in x is the furthest 
unserved request, mrin will complete its work no later than time t + 2x. The 
optimal offline cost OPt(cj) is bounded from below by max{t + a:, 2x}. Therefore, 



MRIN(cr) ^ t + X 

opt(cj) ~ opt((t) 



X ^t + X 

opt(cj) ~ t + X 



X 




3 

2 ' 



□ 

The result established above can be used to obtain competitiveness results 
for the situation of the Oltsp on the real line when there are more than one 
server, and the goal is to minimize the time when the last of its servers returns 
to the origin 0 after all requests have been served. 

Lemma 2. There is an optimal offline strategy for Oltsp on the real line with 
k >2 servers such that no server ever crosses the origin. 

Proof. Omitted in this abstract. □ 



Corollary 1. There is a 3/2-eompetitive algorithm for the Oltsp with k > 2 
servers on the real line. □ 

5 Fair Adversaries 

The adversaries used in the bounds of the previous section are abusing their 
power in the sense that they can move to points where they know a request 
will pop up without revealing the request to the online server before reaching 
the point. As an alternative we propose the following more reasonable adversary 
that we baptized fair adversary. We show that we can obtain better competitive 
ratios for the Oltsp on IRq^ under this model. We will also see that under 
this adversary model there does exist a distinction in competitiveness between 
diligent and non-diligent algorithms. Recall that cr<t is the subsequence of a 
consisting of those requests with release time strictly smaller than t. 
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Definition 2 (Fair Adversary). An offline adversary for the Oltsp in the 
Euclidean space (IR", ||.||) is fair, if at any moment t, the position of the server 
operated by the adversary is within the convex hull of the origin o and the re- 
quested points from a^t ■ 

In the special case of IR(}^ a fair adversary must always keep its server in the 
interval [0,F], where F is the position of the request with the largest distance 
to the origin 0 among all requests released so far. The following lower bound 
result shows that the Oltsp on the real line against fair adversaries is still a 
non-trivial problem. 

Theorem 3. No deterministic algorithm for Oltsp on IR has competitive ratio 
less than (5 + a /^)/8 « 1.57 against a fair adversary. 

Proof. Suppose that there exists a c-competitive online algorithm. The adver- 
sarial sequence starts with two requests at time 0, ci = (0, 1) and a 2 = (0, — 1). 
Without loss of generality, we suppose that the first request that is served is a^. 
At time 2 the online server can’t have served both requests. We distinguish two 
main cases divided in some sub-cases. 

Case 1: None of the requests has been served at time 2. 

— If at time 3 request (J\ is still unserved, let t' be the first time the server 
crosses the origin after serving the request. Clearly, t' > 4. At time t' the 
online server still has to visit the request in —1. If t' > 4c — 2 the server can 
not be c-competitive because the fair adversary can finish the sequence at 
time 4. 

Thus, suppose that 4 < t' < 4c — 2. At time t' a new request = {f, 1) 
is released. The online server can not finish the complete sequence before 
t' + 4, whereas the adversary needs at t' + 1. Therefore, c > jr^ and for 
4 < t' < 4c — 2 we have that 

^ ^ (4c - 2) + 4 _ 4c + 2 
^ - (4c -2) + l ~ 4c- 1 



implying c > (5 + V^)/8. 

— If at time 3 the request cti has already been served, the online server can not 
be to the left of the origin at time 3 (given the fact that at time 2 no request 
had been served). The adversary now gives a new request = (3, 1). There 
are two possibilities: either a 2 , the request in —1, is served before CT3 or the 
other way round. 

If the server decides to serve 02 before then it can not complete before 
time 7. Since the adversary completes the sequence in time 4, the competitive 
ratio is at least 7/4. 

If the online server serves first, then again, let t' be the time that the server 
crosses the origin after serving 173. As before, we must have 4 < t' < 4c — 2. 
At time t' the fourth request CJ4 = ft' , 1) is released. The same arguments as 
above apply to show that the algorithm is at least (5 + v^)/8-competitive. 
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Case 2: One of the requests has been served at time 2 by the online server. 

We assume without loss of generality that (j\ has been served. At time 2 the 
third request 03 = (2, 1) is released. In fact, we are back in the situation in 
which at time 2 none of the two requests are served. In case the movements of 
the online server are such that no further request is released by the adversary, 
the latter will complete at time 4. In the other cases the last released requests 
are released after time 4 and the adversary can still reach them in time. □ 

For comparison, the lower bound on the competitive ratio for the Oltsp in IR 
against an adversary that is not restricted to be fair is (9 + vTz)/8 [2]. As men- 
tioned before, only recently Lipmann [7] presented a (9 4- vT7)/8-competitive 
algorithm against a non-fair adversary. He conjectures that a similar type of 
algorithm will also be best possible against a fair adversary. In contrast, the 
picture for the problem on the non-negative part of the real line is already com- 
plete (see Theorems 5 and 6 for diligent algorithms and Theorems 4 and 7 for 
non-diligent algorithms below). 

Theorem 4. No deterministic algorithm for Oltsp on lR>o has competitive 
ratio of less than (1 + vTz)/4 « 1.28 against a fair adversary. 

Proof. Suppose that alg is c-competitive. At time 0 the adversary releases re- 
quest (Ti = (0,1). Let T denote the time that the server operated by alg has 
served this request and is back at the origin. For alg to be c-competitive, we 
must have that T < c ■ opt(cti) = 2c, otherwise no further requests will be 
released. At time T the adversary releases a second request U2 = (T, 1). The 
completion time of ALG becomes then at least T + 2. 

On the other hand, starting at time 0 the fair adversary moves its server 
to 1, lets it wait there until time T and then goes back to the origin 0 yielding 
a completion time of T + 1. Therefore, 

alg(ct) T + 2 2c +2 1 

opt(o-) “T+1“2c-I-1 '^2c+1’ 

given the fact that T < 2c. Since by assumption alg is c-competitive, we have 
that 1 + l/(2c-t- 1) < c, implying that c > (1 + vTz)/4. □ 

For diligent algorithms we can show a higher lower bound against a fair 
adversary. 

Theorem 5. No deterministic diligent algorithm for Oltsp on IRq has com- 
petitive ratio of less than 4/3 against a fair adversary. 

Proof. Consider the adversarial sequence cti = (0, 1), 02 = (1, 0), and = (2, 1). 
By its diligence the online algorithm will start to travel to 1 at time 0, back to 0 
at time 1, arriving there at time 2. Then its server has to visit 1 again, so that 
he will finish no earlier than time 4. Obviously, the optimal offline solution is to 
leave 1 not before time 2, and finishing at time 3. □ 
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We show now that the algorithm MRIN presented before has a better com- 
petitive ratio against the fair adversary than the ratio of 3/2 achieved against a 
conventional adversary. In fact we show that the ratio matches the lower bound 
for diligent algorithms proved in the previous theorem. 

Theorem 6. Strategy mrin is a 4/ 3 -competitive algorithm for the Oltsp on Mg 
against a fair adversary. 

Proof. Omitted in this abstract. 

Thus, Algorithm MRIN attains a best possible competitive ratio against the 
fair adversary among all diligent algorithms. Given the lower bound for general 
non-diligent algorithms in Theorem 4 we aim now at designing an online algo- 
rithm that obtains better competitive ratios against a fair adversary. In view 
of Theorem 5 such an algorithm will have to be non-diligent, i.e., incorporate 
waiting times. 

The problem with Algorithm MRIN is that shortly after it starts to return 
towards the origin from the furthest previously unserved request, a new request 
to the right of its server arrives. In this case the MRiN-server has to return to 
a position it just left. Algorithm ws presented below attempts successfully to 
avoid this pitfall. 

Strategy ws(“Wait Smartly”) The WS-server moves right if there are yet 
unserved requests to the right of his present position. Otherwise, it takes 
the following actions. Suppose it arrives at his present position, which is a 
currently rightmost unserved request, s{t) at time t. 

1. Compute the the optimal offline solution value OPT(CT<t) for all requests 
released up to time t. 

2. Determine a waiting time W \= a ■ OPT((T<t) — s{t) — t, with a = 

(1 + v^)/4. 

3. Wait at point s{t) until time t + W and then start to move back to the 
origin 0. 

We notice that when the server is moving back to the origin and no new 
requests are released until time t + W + s{t), then the ws-server reaches the 
origin 0 at time t+W + s{t) = a ■ OPT(CT<t) having served all requests released 
so far. If a new request is released at time t' < W + t + s{t) and the request is 
to the right of s{t'), then the WS-server starts to move to the right immediately 
until it reaches the furthest unserved request. 

Theorem 7. Algorithm WS is a-competitive with a = (1 -|- v^l7)/4 « 1.28 for 
the Oltsp on IRq against a fair adversary. 

Proof. By the definition of the waiting time it is sufficient to prove that at any 
point where a waiting time is computed this waiting time is non-negative. In 
that case the server will always return at o before time a opt(cj). This is clearly 
true if the sequence a contains at most one request. We make the induction 
hypothesis that it is also true for any sequence of at most m — 1 requests. 
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Let (T = (Ti, . . . , be any sequence of requests and let =: {t, x) be the 
request released last. If t = 0, then there is nothing left to show, so we will 
assume for the remainder of the proof that t > 0. 

We denote by s{t) and s*{t) the positions of the WS- and the fair adver- 
sary’s server at time t, respectively. We also let rj = (t/,/) be the furthest 
(i.e. most remote from the origin) yet unserved request by WS at time t exclud- 
ing the request a^n- Finally, let rp = {tp,F) be the furthest released request 
in Cl, . . . Obviously f < F. Again, we distinguish three different cases 

depending on the position of x relative to / and F. 

Case 1: X < f 

Since the WS-server has to travel to / anyway and by the induction hypothesis 
there was a non-negative waiting time in / or s{t) (depending on whether s{t) > 
f or s{t) < /) before request dm was released, the waiting time in / or s{t) can 
not decrease since the optimal offline completion time can not decrease by an 
additional request). 

Case 2: f < x < F 

If s{t) > X, then again by the induction hypothesis and the fact that the route 
length of ws’s server does not increase, the possible waiting time at s{t) is non- 
negative. 

Thus we can assume that s{t) < x. The ws-server will now travel to point x, 
arrive there at time t + d{s{t),x), and possibly wait there some time W before 
returning to the origin, with 

W = a opt(c) — {t + d{s{t),x)) — X. 

Inserting the obvious lower bound OPt(c) > t + x yields 

W > {a — I)opt(c) — d{s{t),x). (1) 

To bound opt((t) in terms of d{s{t),x) consider the time t' when the ws- 
server had served the request at F and started to move left. Clearly t' < t 
since otherwise s{t) could not be smaller than x as assumed. Thus, the sub- 
sequence a<t' of a does not contain (t,x). By the induction hypothesis, WS is 
a-competitive for the sequence c<t'. At time t' when he left F he would have 
arrived in the origin at time o;OPT(CT<t/), i.e., 

t' + F = a - OPT{a<t')- (2) 

Notice that t >t' + d{F, Since OPT(c<t') > 2F we obtain from (2) that 

t > a2F — F + d{F, s{t)) = {2a — l)F -f d{s{t),F). (3) 

Since by assumption we have s{t) < x < F we get that d{s{t),x) < d{s{t),F) 
and d{s{t),x) < F, which inserted in (3) yields 

t > {2a — l)d{s{t),x) + d{s{t), x) = 2ad{s{t), x). (4) 

We combine this with the previously mentioned lower bound OPt((t) > t + x to 
obtain: 



OPt(cj) > 2ad{s{t), x) + X > (2o; -|- l)d{s{t), x). 



( 5 ) 
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Using inequality (5) in (1) gives 

W > {a — l){2a + l)d{s{t), x) — d{s{t), x) 
= {2a‘^ — a — 2)d{s{t),x) 



This completes the proof for the second case. 

Case 3: f < F < x 

Starting at time t the ws-server moves to the right until he reaches x, and after 
waiting there an amount W returns to 0, with 

W = a opt((t) — (t + d{s{t), x)) — X. (6) 

We will show that also in this case lU > 0. At time t the adversary’s server still 
has to travel at least d{s*{t),x) + x units. This results in 

opt(u) >t + d{s*{t), x) + X. 

Since the offline adversary is fair, its position s*{t) at time t can not be strictly 
to the right of F. 

OPT{a) >t + d{F,x) + X. (7) 

Insertion into (6) yields 

W>(a- I)opt(o-) - d{s{t),F) (8) 

since F > s{t) by definition of the algorithm. 

The rest of the arguments are similar to those used in the previous case. 
Again that ws’s server started to move to the left from F at some time t' < t, 
and we have 

t' + F = a ■ OPT{a<t>). (9) 

Since t>t' + d{s{t),F) and OPT(cj<t') > 2F we obtain from (9) that 

t > a2F — F + d{s(t), F) = {2a — l)F + d{s{t), F) > 2ad{s{t), F). 

We combine this with (7) and the fact that x > d{s{t),F) to achieve 
opt(cj) > 2ad{s{t), F) + d{F, a;) + a; > {2a + l)d(s(t), F). 

Using this inequality in (8) gives 



W>{a- l){2a+l)d{s{t),F) - d{s{t),F) 
= (2q;^ — a — 2)d{s{t), F) 

[9 + VU 1 + 717 

= ^ ^ 2 d{s{t),F) 



= 0 . 

This completes the proof. 



□ 
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6 Conclusions 

We introduced an alternative more fair performance measure for online algo- 
rithms for the traveling salesman problem. The first results are encouraging. 
On the non-negative part of the real line the fair model allows a strictly lower 
competitive ratio than the conventional model with an omnipotent adversary. 

Next to that we considered a restricted class of algorithms for the online 
traveling salesman problems, suggestively called diligent algorithms. We showed 
that in general diligent algorithms have strictly higher competitive ratios than 
algorithms that sometimes leaves the server idle, to wait for possible additional 
information. In online routing companies, like courier services or transporta- 
tion companies waiting instead of immediately starting as soon as requests are 
presented is common practice. Our results support this strategy. 

It is still open to find a best possible non-diligent algorithm for the problem on 
the real line against a fair adversary. However, it is very likely that an algorithm 
similar to the best possible algorithm presented in [7] against a non- fair adversary 
will appear to be best possible for this case. 

We notice here that for general metric spaces the lower bound of 2 on the 
competitive ratio of algorithms in [2] is established with a fair adversary as 
opponent. Moreover, a diligent algorithm is presented which has a competitive 
ratio that meets the lower bound. 

We hope to have encouraged research into ways to restrict the power of 
adversaries in online competitive analysis. 

Acknowledgement: Thanks to Maarten Lipmann for providing the lower 
bound in Theorem 3. 
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Abstract. We present a practically efficient algorithm for the internal 
sorting problem. Our algorithm works in-place and, on the average, has a 
running-time of O(nlogn) in the length n of the input. More specihcally, 
the algorithm performs nlogn + 3n comparisons and nlogn + 2.65n 
element moves on the average. 

An experimental comparison of our proposed algorithm with the most 
efficient variants of Quicksort and Heapsort is carried out and its results 
are discussed. 

Keywords: In-place sorting, heapsort, quicksort, analysis of algorithms. 



1. Introduction 

The problem of sorting an initially unordered collection of keys is one of the most 
classical and investigated problems in computer science. Many different sorting 
algorithms exist in literature. Among the comparison based sorting methods. 
Quicksort [7, 17, 18] and Heapsort [4, 21] turn out, in most cases, to be the most 
efficient general-purpose sorting algorithms. 

A good measure of the running-time of a sorting algorithm is given by the 
total number of key comparisons and the total number of element moves per- 
formed by it. In our presentation, we mainly focus our attention on the number of 
comparisons, since this often represents the dominant cost in any reasonable im- 
plementation. Accordingly, to sort n elements the classical Quicksort algorithm 
performs 1.386nlogn — 2.846n + 1.3861ogn key comparisons on the average 
and 0{'n?) key comparisons in the worst-case, whereas the classical Heapsort 
algorithm, due to Floyd [4], performs 2nlogn + 0{n) key comparisons in the 
worst-case. 

Several variants of Heapsort are reported in literature. One of the most 
efficient is the Bottom-Up-Heapsort algorithm discussed by Wegener in [19], 
which performs nlogn + /(n)n key comparisons on the average, where /(n) G 
[0.34. .0.39], and no more than 1.5nlogn+(9(n) key comparisons in the worst- 
case. In [9, 10], Katajainen uses a median-finding procedure to reduce the num- 
ber of comparisons required by Bottom-Up-Heapsort, completely eliminating 
the sift-up phase. This idea has been further refined by Rosaz in [16]. It is to 
be noted, though, that the algorithms described in [9, 10, 16] are mostly of 
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theoretical interest only, due to the overhead introduced by the median-finding 
procedure. 

This paper tries to build a bridge between theory and practice. More specifi- 
cally, our goal is to produce a practical sorting algorithm which couples some of 
the theoretical ideas introduced in the algorithms cited above with the efficient 
strategy used by Quicksort. 

Compared to Quicksort, our proposed algorithm, called QuickHeapsort, works 
“in-place”, i.e. no stack is needed for recursion. Moreover, its average number 
of comparisons is shown to be less than nlogn + 3n. Its behavior is also ana- 
lyzed from an experimental point of view by comparing it to that of Heapsort, 
Bottom-Up-Heapsort, and some variants of Quicksort. The results show that 
QuickHeapsort has a good practical behavior especially when key comparison 
operations are computationally expensive. 

The paper is organized as follows. In Section 2 we introduce a variant of the 
Heapsort algorithm which does not work in-place, just to present the main idea 
upon which QuickHeapsort is based. The QuickHeapsort algorithm is fully de- 
scribed and analyzed in Section 3. An experimental session with some empirical 
results aiming at evaluating and comparing its efficiency in practice is discussed 
in Section 4. Section 5 concludes the paper with some final remarks. 

2. A Not In-Place Variant of the Heapsort Algorithm 

In this section we illustrate a variant of the Heapsort algorithm, External- 
Heapsort, which uses an external array to store the output. For this reason, 
External-Heapsort is mainly of theoretical interest and we present it just to intro- 
duce the main idea upon which the QuickHeapsort algorithm, to be described in 
the next section, is based. External-Heapsort sorts n elements in 0(nlogn) time 
by performing at most n[lognJ +2n key comparisons, and at most n[lognJ +4n 
element moves in the worst-case. 

We begin by recalling some basic concepts about the classical binary-heap 
data structure. A max-heap is a binary tree with the following properties: 

1. it is heap- shaped: every level is complete, with the possible exception of the 

last one; moreover the leaves in the last level occupy the leftmost positions; 

2. it is max-ordered: the key value associated with each non-root node is not 

larger than that of its parent. 

A min-heap can be defined by substituting the max-ordering property with the 
dual min-ordering one. The root of a max-heap (resp. min-heap) always contains 
the largest (resp. smallest) element of the heap. We refer to the number of 
elements in a heap as its size, the height of a heap is the height of the associated 
binary tree. 

A heap data structure of size n can be implicitly stored in an array A[l..n] 
with n elements without using any additional pointer as follows. The root of the 
heap is the element A[l]. Left and right sons (if they exist) of the node stored 
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into A[i] are, respectively, A[2i] and A[2i + 1], and the parent of the node stored 
into A[i] (with f > 1) is 

In all Heapsort algorithms, the input array is sorted in ascending order, by 
first building a max-heap and then by performing n extractions of the root 
element. After each extraction, the element in the last leaf is firstly moved into 
the root and subsequently moved down along a suitable path until the max- 
ordering property is restored. 

Bottom-Up-Heapsort works much like the classical Heapsort algorithm. The 
only difference lies is the rearrangement strategy used after each max-extraction: 
starting from the root and iteratively moving down to the child containing the 
largest key, when a leaf is reached it climbs up until it finds a node x with a key 
larger than the root key. Subsequently, all elements in the path from x to the 
root are shifted one position up and the old root is moved into x. 

The algorithm External-Heapsort, whose pseudo-code is shown in Fig. 1, 
takes the elements of the input array A[l..n] and returns them in ascending 
sorted order into the output array Ext[l..n]. 

External-Heapsort starts by constructing a heap and successively performs n 
extractions of the largest element. Extracted elements are moved into the output 
array in reverse order. After each extraction, the heap property is restored by 
a procedure similar to the bottom-phase of Bottom-Up-Heapsort. Specifically, 
starting at the root of the heap, the son with the largest key is chosen and it 
is moved one level up. The same step is iteratively repeated until a leaf, called 
special leaf, is reached. At this point the value — oo is stored into the special leaf 
key field. ^ The path from the root to the special leaf is called a special path. 

Notice that no sift-up phase is executed and that the length of special paths 
does not decrease during the execution of the algorithm. 

External-Heapsort makes use of the procedure Build- Heap and the function 
Special-Leaf. Build- Heap rearranges the input array A[l..n] into a classical max- 
heap, e.g., by using the standard heap-construction algorithm by Floyd [4]. The 
function Special-Leaf, whose pseudo-code is also shown in Fig. 1, assumes that 
the value contained in the root of the heap has already been removed. 

Correctness of the algorithm follows by observing that if a node x contains 
the key — oo than the whole sub-tree rooted at x contains — oo’s. It is easy to 
check that the max-ordering property is fulfilled at each extraction. 

We proceed now to the analysis of the number of key comparisons and el- 
ements moves performed by External-Heapsort both in the worst and in the 
average case.^ 

Many variants for building heaps have been proposed in the literature [1, 6, 
13, 19, 20], requiring quite involved implementations. Since our goal is to give 
a practical and efficient general-purpose sorting algorithm, we simply use the 



^ We assume that a key value — c», smaller than all keys occurring in A[l..n], is avail- 
able. 

^ In the average-case analysis we make use of the assumption that all permutations 
are equally likely. 
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External-Heapsort 

PROCEDURE External-Heapsort (Input Array A, Integer n; Output Array Ext) 
Var Integer j, I; 

Begin 

Build-Heap (A,n); 

For j := n downto 1 do 
Ext[j] := A[l]; 

I := Special-Leaf (A,n); 

Key(A[Z]) := -oo; 

End for; 

End; 

FUNCTION Special-Leaf (Input Array A, Integer n) : Integer 
Var Integer f; 

Begin 

i ;= 2; 

While i < n do 

If Key(A[f]) < Key [A[i + 1]) then i := i + l\ End if; 

A[L|J] := A[f]; 
i := 2i; 

End while; 

If i = n then 

A[|] := A[n]; 
i := 2i\ 

End if; 

Return |; 

End; 



Fig. 1. Pseudo-code of External-Heapsort algorithm 



classical heap-construction procedure due to Floyd [4, 11]. In such a case we 
need the following partial results [2, 11, 15]: 

Lemma 1. In the worst-case, the elassical heap-construetion algorithm builds a 
heap with n elements by performing at most 2n key comparisons and 2n element 
moves. □ 



Lemma 2. On the average, constructing a full-heap, i.e. a heap of size n = 
2^ — I > 0, with the classical algorithm requires 1.88n key comparisons and 
1.53n element moves. □ 

The number of key comparisons and element moves performed by the Special- 
Leaf function obviously depends only on the size of the input array, so that worst- 
and average-case values coincides for it. 
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Lemma 3. Given a heap of size n, the Special- Leaf funetion performs exactly 
[lognj or [lognj — 1 key comparisons and the same number of element moves. 

□ 



The preceding lemmas yield immediately the following result. 

Theorem 1. External- H caps ort sorts n elements in 6>(nlogn) worst-case time 
by performing fewer than n [log nj + 2n key comparisons and n [log nj + 4n ele- 
ment moves.^ Moreover, on the average, key comparisons and 

element moves are performed, where 

n [lognj + 0.88n < H^algin) < n [lognj + 1.88n, 
n [lognj + 0.53n < i?l™j(n) < n [lognj + 3.53n. 



□ 

In the following, we will also use a min-heap variant of the External-Heapsort 
algorithm. In particular, special paths in min-heaps are obtained by following 
the children with smallest key and the value — oo is replaced by +oo. Obviously, 
the same complexity analysis can be carried out for the min-heap variant of 
External-Heapsort . 

3. QuickHeapsort 

In this section, a practical and efficient in-place sorting algorithm, called Quick- 
Heapsort, is presented. It is obtained by a mix of two classical algorithms: 
Quicksort and Heapsort. More specifically, QuickHeapsort combines the Quick- 
sort partition step with two adapted min-heap and max-heap variants of the 
External-Heapsort algorithm presented in the previous section, where in place 
of the infinity keys ±oo, only occurrences of keys in the input array are used. 

As we will see, QuickHeapsort works in place and is completely iterative, so 
that additional space is not required at all. 

The computational complexity analysis of the proposed algorithm reveals 
that the number of key comparisons performed is less than n log n + 3n on the 
average, with n the size of the input, whereas the worst-case analysis remains 
the same of classical Quicksort. From an implementation point of view, Quick- 
Heapsort preserves Quicksort efficiency, and it has in many cases better running 
times than Quicksort, as the experimental Section 4 illustrates. 

Analogously to Quicksort, the first step of QuickHeapsort consists in choosing 
a pivot, which is used to partition the array. We refer to the sub-array with the 
smallest size as heap area, whereas the largest size sub-array is referred to as 
work area. Depending on which of the two sub-arrays is taken as heap-area, the 

® As will be clear in the next section, it is convenient to count the assignment of — oo 
to a node as an element move. 
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adapted max-heap or min-heap variant of External-Heapsort is applied and the 
work area is used as an external array. At the end of this stage, the elements 
moved in the work area are in correct sorted order and the remaining unsorted 
part of the array can be processed iteratively in the same way. 

A detailed description of the algorithm follows. 

1. Let A[l..n] be the input array of n elements to be sorted in ascending order. 
A pivot M, of index m, is chosen in the set {A[l], A[2], . . . , A[n]}. As in 
Quicksort, the choice of the pivot can be done in a deterministic way (with 
or without sampling) or randomly. The computational complexity analysis 
of the algorithm is influenced by the choice adopted. 

2. The array A[l..n] is partitioned into two sub-arrays, A[l.. Pivot — 1] and 
A[Pivot + l..n], such that A[Pivot] = M, the keys in A[l.. Pivot — 1] are 
larger than or equal to M, and the keys in A[Pivot + l..n] are smaller than 
or equal to M The sub-array with the smallest size is assumed to be the 
heap area, whereas the other one is treated as the work area (if the two 
sub-arrays have the same size, a choice can be made non-deterministically) . 

3. Depending on which sub-array is taken as heap area, the adapted max-heap 
or min-heap variant of External-Heapsort is applied using the work area as 
external array. Moreover occurrences of keys contained in the work area are 
used in place of the infinity values ±oo. 

More precisely, if A[l.. Pivot — 1] is the heap area, then the max-heap version 
of External-Heapsort is applied to it using the right-most region of the work 
area as external array. In this case, at the end of the stage, the right-most 
region of the work area will contain the elements formerly in A[l.. Pivot — 1] 
in ascending sorted order. 

Similarly, if A[Pivot + l..n] is the heap area, then the min-heap version of 
External-Heapsort is applied to it using the left-most region of the work area 
as external array. In this case, at the end of the stage, the left-most region 
of the work area will contain the elements formerly in A[Pivot + l..n] in 
ascending sorted order. 

4. The element A[Pivot] is moved in the correct place and the remaining part 
of A[l..n], i.e. the heap area together with the unused part of the work area, 
is iteratively sorted. 

Correctness of QuickHeapsort follows from that of the max-heap and min- 
heap variants of External-Heapsort, by observing that assigning to a special leaf 
a key value taken in the work area is completely equivalent to assigning the key 
value — 00 , in the case of the max-heap variant, or the key value -f-oo, in the case 
of the min-heap variant. 

Complexity results in the average case are summarized in the theorem below. 
For simplicity, the results have been obtained only in the case in which pivots 
are chosen deterministically and without sampling, e.g. always the first element 
of the array is chosen. 

^ Observe that Quicksort partitions the array in the reverse way. 
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Lemma 4. Let H{n) = nlogn + an and /i(n), / 2 (n) be funetions of type j3n + 
o(n) for all n G N, with a,f3 G R. The solution to the following recurrenee 
equations, with initial eonditions C'(l) = 0 and C{2) = 1: 

C{2n) = ^ [(2n + 1) • C(2n - 1) - C(n - 1) + H(n - 1) + /i(n)] , (1) 
In 

C{2n + 1) = [2(n + 1) • C{2n) - C{n) + H{n) + / 2 (n)] , (2) 

2n + 1 

for all n G N, is: 

C{n) = nlogn + {a + (3 — 2.8854)n+ o(n). 



Proof. Among the reasonable solutions, we posit the trial solution: 

C'(n) = an log n + 6n + clog n witha, 6 ,cgR. (3) 



In several of the calculations we need to manipulate expressions of the form 
log(m + t) with m G N,m > 1 and t = ±1. The expansion of the natural 
logarithm for small a; G R to second order (we do not need any further here), 
ln(l+a;) = x—^, gives ln(m+t) = ln[m- (1 + ;^)] = — 2 ^- Multiplying 

by s = l/(ln2) « 1.4427, we get: 

st s 

log(m + t) = logm H — 

m 2m^ 

Using such expansion in the definition of H{n) and in (3), we get: 



H{n - 1) 

C{n - 1) 
C{2n - 1) 



n log n + an — log n — (a + s) H , 

2n 

an log n+ bn+ {c — a) log n — {as + b) + s 
2anlogn + 2(a + b)n + (c — a) logn + 



+ {c — a — b— as) + s 



a — 2c 
2n ’ 

a — 2c 



4n 



( 4 ) 

( 5 ) 

(6) 



where the lowest order terms are not considered. 

Let S = s(c — |a + 1); substituting (4), (5) and (6) into equation (1) and 
simplifying, we find the following: 



(1 — a)n log n + (a + /3 — b — 2as)n — logn +(c — a— a — S')H h o(n) = 0. 

2n 

Examining the leading coefficients in such equality, we get the asymptotic con- 
sistency requirements: 

a = 1, b = a + j3 — 2s. 

Such constraints are similarly obtained expanding equation (2). 

Clearly, the missing requirement about c, simply means the posited solution 
does not have the exact functional form. The two leading terms of the solution 
are surprisingly precise, indeed by numerical computation, solution (3) tracks 
the behavior of C{n) fairly well for an extended range of n. ■ 
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Theorem 2. QuickHeapsort sorts n elements in-place in 0{n\ogn) average- 
case time. Moreover, on the average, it performs no more than nlogn + 3n key 
comparisons and nlogn + 2.65n element moves. 

Proof. First, we estimate the average number of key comparisons. 

Let Havg{n) (resp. be the average number of key comparisons to 

sort n elements with the adapted max-heap (resp. min-heap) version of External- 
Heapsort (see the beginning of Section 3). Plainly, we have Havg{i) = H'^^g{i) = 
0 with f = 0, 1. 

Let C'(n) be the average number of key comparisons to sort n elements with 
QuickHeapsort. We have C'(l) = 0 and, for all n G N: 

n 

J2Wavg{j-l)+C{2n-j)] + 

2n 

+ [HLg{2n-j) + C{j-l)] , 

j=n+l 
n 

y^}tigvg{j — 1) + C{2n + 1 — j)] + 

_i=i 

2n+l 

+ [Havg{n) + C'(n)] + ^ [H'^^g{2n + 1 — j) + C{j — 1)] . 

j=n+2 

To compute the total average number of key comparisons we add the number of 
comparisons p{m) = m + 1 needed to partition the array of size m to the average 
number of comparisons needed to sort the two sub-arrays obtained. The index j 
denotes all the possible choices (uniformly distributed) for the pivot. Obviously 
^avgi^) — ^avg{n), SO by simple indices manipulation, we obtain the following 
recurrence equations: 



C{2n -f 1) — p{2n + 1) + - — — — 

zn + 1 



C{2n) =p{2n) + — 
2n 



(2n) • C(2n) = (2n) • p{2n) + 2 ^ - 1) + C{2n - j)] , (7) 

(2ti + 1) * C(2ti H- 1) = (2?t- + 1) • p(2n + 1) -|- 

n 

+ ^ Wavgjj ~ f) + C{2n + 1 — j)] . (8) 
i=i 

They depend on the previous history but can be reduced to semi-first order 
recurrences. Let (8^) be the equation obtained from (8) by substituting the index 
n with n — 1. Subtracting equation (7) from (8') and from (8) we obtain the 
following equations: 

(2n) • C{2n) = (2n + 1) • C{2n - 1) — C{n - 1) + Havg{n — 1) + /i(n), 
(2u + 1) • C{2n -b 1) = {2n + 2) • C{2n) — C{n) + Havg{n) + / 2 (^)j 
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where /i(n) = (2n) ■p{2n) — (2n — 1) •p(2n — 1) = 4n and / 2 (n) = (2n+l) - p(2n + 
1) — (2n) ■p{2n) = 4n + 2. By using Lemma 4 and the upper bound in Theorem 

l, it can be shown that the recurrence equation satisfies C{n) < nlogn + 3n. 

Analogous recurrence equations can be written to get the average number of 
element moves. In such a case, the function Havg{n) (resp. iL'„g(n)) denotes the 
average number of element moves to sort n elements with the adapted max-heap 
(resp. min-heap) version of External-Heapsort; whereas p{m) is three times the 
average number q{m) of exchanges used during the partitioning stage of a size 
m array. 

If the chosen pivot A[l] is the k-ih smallest element in the array of size 

m, q{m) is the number of keys among A[2 ], . . . , A[k] which are smaller than the 

pivot. There are exactly t such keys with probability p^^^ = / (T-i ) ■ 

Averaging on t and k, we get: 



fe-i 






■q 



k=l t=0 



m 



E 

fe=i 



fe-i 



m — k (m — k — 1\ ( k — 1 



6 



(m - 2), 



where the last equality is obtained by two applications of Vandermonde’s con- 
volution. 

From p{m) = |(m — 2), we get /i(n) = 2n — | and / 2 (n) = 2n — Thus, 
Lemma 4 and the upper bound in Theorem 1 yield immediately that the average 
number of element moves is no more than nlogn + 2.65n. ■ 



4. Experimental Results 

In this section we present some empirical results concerning the performance 
of our proposed algorithm. Specifically, we will compare the number of basic 
operations and the timing results of both QuickHeapsort (QH) and its variant 
clever-QuickHeapsort (c-QH) (which implements the median of three elements 
strategy) with those of the following comparison-based sorting algorithms: 

— the classical Heapsort algorithm (H), implemented with a trick which saves 
some element moves; 

— the Bottom-Up-Heapsort algorithm (BU), implemented with bit shift oper- 
ations, as suggested in [19]; 

— the iterative version of Quicksort (i-Q), implemented as described in [3]; 

— the Quicksort algorithm (Q), implemented with bounded stack usage, as 
suggested in [5]; 

— the very efficient LEDA [12] version of clever-Quicksort (c-Q), where the 
median of three elements is used as pivot. 
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Our implementations have been developed in standard C (GNU C compiler 
ver. 2.7) and all experiments have been carried out on a PC Pentium (133 MHz) 
32MB RAM with the Linux 2.0.36 operating system. 

The choice to use C, rather than C++ extended with the LEDA library is 
motivated by precise technical reasons. In order to get running-times indepen- 
dent of the implementation of the data type < array > provided by the LEDA 
library, we preferred to implement all algorithms by simply using C arrays, and 
accordingly by suitably rewriting the source code supplied by LEDA for Quick- 
sort. 

Observe that all implementation tricks, as well as the various policies to 
choose the pivot, used for Quicksort can be applied to QuickHeapsort too. 

For each size n = 10% f = 1..6, a fixed sample of 100 input arrays has been 
given to each sorting algorithm; each array in such a sample is a randomly gen- 
erated permutation of the keys l..n. For each algorithm, the average number 
of key comparisons executed, E[Cn], is reported together with its relative stan- 
dard deviation, (r[C'„]/n, normalized with respect to n. Analogously, E[An] and 
c[A„]/n refers to the number of element moves. Experimental results are shown 
in Table 1. 

They confirm pretty well the theoretical results hinted at in the previous 
section. Notice that most of the numbers quoted in Table 1 about Heapsort 
and Quicksort are in perfect agreement with the detailed experimental study of 
Moret and Shapiro [14]. 

We are mainly interested in the number of key comparisons since these rep- 
resent the dominant cost, in terms of running-times, in any reasonable imple- 
mentation. Observe that, in agreement with intuition, the improvement of c-Q 
relative to Q (in terms of number of key comparisons) is more sensible than 
that of c-QH relative to QH. With the exception of BU, when n is large enough 
c-QH executes the smallest number of key comparisons, on the average; more- 
over, according to theoretical results, QH always beat both Q and i-Q. It is also 
interesting to note that H and BU are very stable, in the sense that they present 
a small variance of the number of key comparisons. 

In Table 2, we report the average running times required by each algorithm 
to sort a fixed sample of 10 randomly chosen arrays of size n = 10% with i = 4. .6. 

Such results depend on the data type of the keys to be ordered (integer 
or double) and the type of comparison operation used (either built-in or via a 
user-defined function cmp). In particular, six different cases are considered. In 
the first two cases, the comparison operation used is the built-in one. In the 
third and fourth case, a simple comparison function cmpi is used. Finally, in the 
last two cases, two computationally more expensive comparison functions cmp 2 
and cmp 3 are used (but only with keys of type integer) , to simulate situations in 
which the cost of a comparison operation is much higher than that of an element 
move.® The function cmp 2 (resp. cmps) has been obtained from a simple function 

® For instance, such situations arise when the array to sort contains pointers to the 
actual records. A move is then just a pointer assignment, but a comparison involves 
at least one level of indirection, so that comparisons become the dominant factor. 
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n = 


10 






n = 


10^ 






E[Cr.\ 


o‘[U„]/n 


E[A„] 


u[A„]/n 


E[Cr.] 


o‘[U„]/n 


E[A„] 


a[A„]/n 


H 


39 


(.21) 


73 


(.17) 


1030 


(.08) 


1078 


(.07) 


BU 


35 


(.21) 


73 


(.17) 


709 


(.08) 


1078 


(.07) 


i-Q 


63 


(.96) 


43 


(.64) 


990 


(.61) 


685 


(.26) 


Q 


41 


(.63) 


27 


(.32) 


868 


(.64) 


500 


(.19) 


c-Q 


28 


(.19) 


37 


(.53) 


638 


(.29) 


617 


(.20) 


QH 


39 


(.48) 


54 


(.39) 


806 


(.58) 


847 


(.23) 


c-QH 


29 


(.20) 


60 


(.51) 


714 


(.22) 


870 


(.22) 







n = 


10'" 






n = 


10"‘ 






E[C„] 


u[C„]/n 


E[A„] 


cr[A„]/n 


E[C„] 


cr[C„]/n 


E[A„] 


cr[A„]/n 


H 


16848 


(.031) 


14074 


(.024) 


235370 


(.010) 


174198 


(.007) 


BU 


10422 


(.021) 


14074 


(.024) 


137724 


(.006) 


174198 


(.007) 


i-Q 


14471 


(.605) 


9146 


(.106) 


194279 


(.878) 


114419 


0092) 


Q 


13297 


(.609) 


7285 


(.095) 


179948 


(.654) 


95807 


0072) 


c-Q 


10299 


(.355) 


8543 


(.102) 


142443 


(.401) 


109141 


0065) 


QH 


11881 


(.630) 


11838 


(.202) 


152789 


(.664) 


152155 


(.201) 


c-QH 


11135 


(.333) 


11959 


(.182) 


146643 


(.323) 


152909 


(.121) 





n = lO'^ 


n = 10“ 




E[C„] 


a[C'„]/n E[A„] 


u[A„]/n 


E[C„] 


a[C„]/n E[An\ 


a[An]/n 


H 


3019638 


(.0031) 2074976 


(.0025) 


36793760 


(.0010) 24048296 


(.0008) 


BU 


1710259 


(.0024) 2074976 


(.0025) 


20401466 


(.0007) 24048296 


(.0008) 


i-Q 


2421867 


(.7037) 1374534 


00689) 


28840152 


(.6192) 16068733 


00649) 


Q 


2249273 


(.6828) 1189502 


00726) 


27003832 


(.5389) 14212076 


00635) 


c-Q 


1816706 


(.3367) 1328265 


00546) 


22113966 


(.2962) 15649667 


00497) 


QH 


1869769 


(.6497) 1854265 


02003) 


21891874 


(.6473) 21901092 


01853) 


c-QH 


1799240 


(.3254) 1866359 


(.1675) 


21355988 


(.3282) 21951600 


(.1678) 



Table 1. Average number of key comparisons and element moves (sample size 
= 100 ). 



cmpi by adding one call (resp. two calls) to the function log of the C standard 
mathematical library. 

For each case considered, an approximation of the average times required by 
a single key comparison tc and by a single element move tm is also reported. 

Table 2 confirms the good behaviour of all Quicksort variants i-Q, Q, c- 
Q; moreover, we can see that BU suffers from higher overhead due to internal 
bookkeeping. In most cases the running-times of QH and c-QH are between those 
of the variants of Heapsort, H and BU, and those of the variants of Quicksort, 
Q, i-Q, and c-Q. 

For each trial, the best running time (represented as boxed value in Table 
2) is always achieved by a “clever” algorithm, namely either c-Q or c-QH. In 
particular, when each key comparison operation is computationally expensive, 
c-QH turns out to be the best algorithm, on the average, in terms of running 
times. 
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Table 2. Average running times in seconds (sample size = 10). 



Cache performance has considerably less influence on the behaviour of sorting 
algorithms than does paging performance (cf. [14], Chap. 8); for such reason, we 
believe that we can ignore completely possible negative effects due to caching. 

Concerning virtual memory problems, i.e. demand paging, all Quicksort al- 
gorithms show good locality of reference, whereas Heapsort algorithms, and also 
QuickHeapsort algorithms, tend to use pages that contain the top of the heap 
heavily, and to use in a random manner pages that contain the bottom of the 
heap (cf. [14]). Such observation allows us to conclude that an execution of c-Q 
cannot be more penalized than an execution of c-QH by delays due to paging 
problems. Hence, we can reasonably conclude that the success of c-QH is not 
due to paging performance. 



5. Conclusions 

We presented QuickHeapsort, a new practical “in-place” sorting algorithm ob- 
tained by merging some characteristics of Bottom-Up-Heapsort and Quicksort. 
Both theoretical analysis and experimental tests confirm the merits of Quick- 
Heapsort. 

The experimental results obtained show that it is convenient to use clever- 
QuickHeapsort when the input size n is large enough and each key comparison 
operation is computationally expensive. 
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Abstract. It is known that some triangulation graphs admit straight-line 
drawings realizing certain characteristics, e.g., greedy triangulation, minimum- 
weight triangulation, Delaunay triangulation, etc.. Lenhart and Liotta [12] in 
their pioneering paper on “drawable” minimum-weight triangulations raised an 
open problem: ‘Does every triangulation graph whose skeleton is a forest admit 
a minimum-weight drawing?’ In this paper, we answer this problem by 
disproving it in the general case and even when the skeleton is restricted to a 
tree or, in particular, a star. 



Keywords: Graph drawing. Minimum-weight triangulation. 



1 Introduction 

Drawing of a graph on the plane is a pictorial representation commonly used in many 
applications. A “good” graph drawing has some basie characteristies [4], e.g., 
planarity, straight-line edges, ete. One of the problems facing graph drawing is where 
to place the graph vertices on the plane, so as to realize these eharaeteristics. For 
example, the problem of Euclidean minimum spanning tree (MST) realization is to 
locate the tree vertices sueh that the minimum spanning tree of these vertices is 
isomorphic to the given tree. However, not all trees have a MST realization, it ean be 
shown easily that there is no MST realization of any tree with a vertex of degree 7 or 
more. In faet, the MST realization of a tree with maximum vertex degree 6 is NP- 
eomplete [6]. 

Recently, researehers have paid a great deal of attention to the graph drawing of 
eertain triangulations. A planar graph G=(E, F) is a triangulation, if all faees of G are 
bounded by exactly three edges, exeept for one whieh may bounded by more than 
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three edges, and this face is called the outerface. A minimum-weight triangulation 
realization of G=(E, V) is to place V in the plane so that the minimum weight 
triangulation of V, (MWT(V)), is isomorphic to G. An excellent survey on drawability 
and realization for general graphs can be found in [2] and a summary of results related 
to our work can be found in the following table. 





Graph 


Realization 


Result 


1 


Planar Graph 


Straight-line drawing 


Always possible [81 


2 


Tree 


Minimum spanning tree 


Maximum vertex degree 
< 5 polynomial time 
= 6 NP-complete [6] 

> 6 non-drawable [151 


3 


Triangulation 


Delaunay triangulation 


Drawable and non-drawable 
conditions [51 


4 


Maximal 

Outerplanar Graph 


Minimum-weight 

triangulation 




5 


Triangulation 


Minimum-weight 

triangulation 


Non-drawable condition [12] 


6 


Maximal 

Outerplaner Graph 


Maximum-weight 

triangulation 


Non-drawable condition [17] 


7 


Caterpillar Graph 


Inner edges of Maximum- 
weight triangulation of a 
convex point set 


Linear time [17] 



Table 1 



(1) Every planar graph has a straight-line drawing realization [8]. 

(2) Monma and Suri [15] showed that a tree with maximum vertex degree of more 
than six does not admit a straight-line drawing of minimum spanning tree. Eades 
and Whitesides [6] proved that the realization of Euclidean minimum spanning 
trees of maximum vertex degree six is NP-hard. 

(3) Dillencourt [5] presented a necessary condition for a triangulation admitting a 
straight-line drawing of Delaunay triangulation and also a condition for non- 
drawability. 

(4) Lenhart and Liotta [11] studied the minimum-weight drawing for a maximal 
outerplanar graph, and discovered a characteristic of the minimum-weight 
triangulation of a regular polygon using the combinatorial properties of its dual 
trees. With this characteristic, they devised a linear-time algorithm for the 
drawing. 

(5) Lenhart and Liotta [12] further demonstrated some examples of ‘non-drawable’ 
graphs for minimum-weight realizations, and also proved that if any graph 
contains such non-drawable subgraph, then it is not minimum-weight drawable. 

(6) Wang, et. al. studied the maximum-weight triangulation and graph drawing, a 
simple condition for non-drawability of a maximal outerplanar graph is given in 
[17]. 
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(7) A caterpillar is a tree such that all internal nodes connect to at most 2 non-leaf 
nodes. Wang, et. al. [17] showed that caterpillars are always linear-time realizable 
by the inner edges of maximum- weight triangulation of a convex point set. 

In this paper, we investigate the open problem raised by Lenhart and Liotta [12] to 
determine whether or not every triangulation whose ‘skeleton’ is a forest admits a 
minimum-weight drawing. The skeleton of a triangulation graph is the remaining 
graph after removing all the boundary vertices and their incident edges on the 
outerface. Intuitively, the answer to this open problem seems to be affirmative by 
adapting the same idea in the drawing of wheel graphs or k-spined graphs [12]. That 
is, one can stretch the vertices of a tree in the forest-skeleton arbitrarily far apart from 
each other as well as from other trees. In this manner, all the vertices in the forest- 
skeleton would be “isolated” from each other. The edges of the trees would be 
minimum-weight and the edges connecting the removed vertices would also be 
minimum-weight in hoping that the “long distance” will make such a localization. 
However, this intuition turns out to be false as the removed part of the graph plays an 
indispensable role in the MWT. As matter of a fact, there exist some minimum-weight 
non-drawable triangulations whose skeleton is a forest or a tree. It is worth noting 
that the proof of some graphs being ‘non-drawable’ is similar to the proof of a lower 
bound of a problem, which requires some non-trivial observation. In Section 3, we 
derive a combinatorial non-drawability sufficient condition for any minimum weight 
triangulation. Then we apply this condition to show that some triangulations with 
forest skeletons are not minimum-weight drawable. In Section 4, we further disprove 
the conjecture by showing the existence of a tree-skeleton triangulation, in particular, 
a star-skeleton triangulation which is not minimum-weight drawable. In Section 5, we 
conclude our work. 



2 Preliminaries 

Definition 1: Let S' be a set of points in the plane. A triangulation of S, denoted by 
T{S), is a maximal set of non-crossing line segments with their endpoints in S. 

The weight of a triangulation T{S) is given by oiT{S)) = ^^d{S-S j) , where d(spi) 

SjS^eT (S) 

is the Euclidean distance between s. and s. of S. A minimum-weight triangulation of 
S, denoted by MWT(S), is defined as, for all possible T(S), cc(MWT(S)) = min 

{mm- □ 

Property 1: (Implication property) 

A triangulation T(S) is called k-gon local minimal or simply k-minimal, denoted by 
7j(S), if any k-gon extracted from T{S) is a minimum-weight triangulation for this k- 
gon. Let ‘fl b' denote ‘a implies b’ and a contains b. Then following implication 
property holds: 



MWT{S) > r.,(S) X ■ ■ ■ >T,{S) > T{S). □ 
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Figure 1(a) illustrates an example which is 4-minimal but not 5-minimal. Note in the 
figure that every quadrilateral has a minimum-weight triangulation but not the 
pentagon abdef. So Figure 1(a) is a but not F, nor MWT. On the other hand, Figure 
1(b) gives the MWT of the same vertex set. 





(b): MWT 



Figure 1: 4-minimal but not minimum 



Definition 2: Let e be an internal edge of any triangulation. Then, e is a diagonal of 
a quadrilateral inside the triangulation, say abed with e = {a, c). Angles Zabc and 
Zeda are called facing angles w.r.t. e. Note that each internal edge of a triangulation 
has exactly two facing angles (Figure 2). □ 

Property 2: Let Aabc be a triangle in the plane and d be an internal vertex in Aabc. 
Then, at most one of Zadb, Zbdc, and Zeda can be acute. □ 

Lemma 1: Let abed denote a quadrilateral with diagonal (a, c) and with two obtuse 
facing angles, Zabc and Zeda. If such a quadrilateral always exists in any drawing 
of a given triangulation, then this triangulation is minimum-weight non-drawable, in 
particular, 4-minimal non-drawable. 





Figure 2: For the proof of Lemma 1 
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Proof: Since both facing angles Aabc and /.cda of edge {a, c) are greater than 90°, 
the quadrilateral abed must be convex (refer to Figure 2). Then, edge (b, d) lies inside 
abed and {b, d) < (a, c), quadrilateral abed is not 4-minimal. Since such a 
quadrilateral always exists in any drawing of the triangulation by the premise, the 
triangulation is not a 4- minimal, nor minimum-weight drawable. □ 



3 Forest-Skeleton Triangulations 

In this section, we shall give a combinatorial sufficient eondition for a triangulation to 
he minimum-weight non-drawable. With the condition, we ean prove that there exists 
a forest-skeleton whieh is minimum-weight non-drawable, thus, disprove the 
conjecture by Lenhart and Liotta [12]. 



3.1 Non-drawable Condition for Minimnm-Weight Triangnlations 

In the following, we shall provide a combinatorial sufficient condition for a 

triangulation to be 4-minimal non-drawable. 

N4o-Condition: Let G be a triangulation such that 

(1) G contains a simple eircuit C with non-empty set V of internal vertices. 

(2) Inside C, let V’ denote the subset of V sueh that each element in V’ is of 
degree three; each element in V” = V — V’ is of degree more than three; and 
let /be the number of faces after the removal of vertices in V’ and their 
ineident edges. Then, G satisfies the following conditions: 



(i) \V”\> 1, and 

(ii) / < \V’ \ +(\V”\- l)/2. □ 

It is easy to see that no two vertices in V’ are adjacent to each other and thus | L’ | </ 
Figure 3 gives a subgraph which satisfies the N4o-Condition, | F’ | =I I, \ V”\ =4, and 

/= 12 < 1 F’ I + ( I F” I - l)/2 = 12.5. 

Lemma 2: Let G be a triangulation. If G satisfies the N4o-Condition, then G is 4- 
minimal non-drawable. 

Proof: Let G^ denote the portion of G enelosed by C. Let G \ denote the remaining 
graph of G^ after the removal of V’ (the vertices of degree three) as well as their 
ineident edges. In G\, let / e, and n denote the number of faces, the number of 
edges, and the number of vertices, respeetively; let /’ denote the number of faces 
originally not containing any vertex of F’; let e’ denote the number of edges not lying 
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on C; and let « , denote the number of vertiees on C. Sinee all faces in are triangles, 

2{e-n^) + n^ 

we have/= ^ . Together with Euler formula on G^, f — e + n = 1, we 

have that 



e = 3« - 3 — , f = 2n — 2 — 



( 1 ). 



By part (ii) of N4o-Condition,/< \V’\ + (\V”\- l)/2. As /=/’ + \ V’\, f + \V’\< 
\v\ + (\v”\- l)/2. Then,/’ < ( | E” | - l)/2, or 2/’ < \v” \ Note that \v” \ =n 
— n^, we have 

2f <n — n^—\ (2). 



As e - «^ = e ’ , we have by (1) and (2) that 

/ + 2/’<e’ (3). 



Note that an edge of G \ not on C can be regarded as the diagonal of a quadrilateral in 
G^. As every diagonal has two facing angles and G \ contains e ’ internal edges, there 
are exaetly 2e ’ facing angles. Moreover, G \ contains / faces, f — f of them have a 
white node inside and thus each of these faees contributes at most one acute facing 
angle (Property 2). On the other hand, eaeh of the /’ faces eontributes at most three 
acute faeing angles. Thus, the total number of acute facing angles for these e’ interior 
edges in G^ is at most /-/’ + 3/’ (=/’ + 2/’). By (3), the number of internal edges is 
greater than the number of acute facing angles in G^. Thus, at least one of the e’ 
internal edges (diagonals) is not associated with an acute faeing angle and must have 
two obtuse facing angles. By Lemma 1, G^ cannot admit a 4-minimal drawing. □ 

By Property 1 and Lemma 2, we have 

Theorem 1: Let G be a triangulation. If G satisfies the N4o-Condition, then G is 
minimum-weight non-drawable. □ 

Note that Theorem 1 is applicable to any triangulation Gj. by treating the hull of Gj. as 
the eireuit C stated in the N4o-Condition. Refer to Figure 3. 



3.2 A Minimum-Weight Non-drawable Example for a Forest-Skeleton 
Triangulation 

We shall construct a triangulation whose skeleton is a forest and which satisfies the 
N4o-Condition. Then, the non-drawable elaim follows from Theorem 1, whieh 
answers the open problem that not all triangulations with a forest-skeleton are 
minimum-weight drawable. 
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Figure 3: An example of a 4-optimal non-drawable triangulation. The darken vertices are of 
degree more than three and the white vertices are of degree three. The sizes of V’ and V” are 
11 and 4 respectively, n= 10, e = 21, f = 12, n^ = 6, f = 1, e’ = 15. 



Theorem 2: There exists a triangulation with a forest-skeleton which is not 

minimum-weight drawable. 

Proof: The triangulation shown in Figure 4 has a forest-skeleton (the darken edges). 
It contains a simple cycle Q = (Vj, v^, v„ vf. Inside Q, I F”|= 2, i.e., {Vj„ VjJ; 

I F’ I = 6, i.e., {v5, v^, v„ v^, v„ Vj„}; /= 6 (the number of faces after the removal of 
V’). Thus, part (1) of N4o-Condition: \v”\ > 1 is satisfied and part (2) of N4o- 
Condition:/< | F’ | -i- ( | F” | - l)/2 is also satisfied. Then, G is not minimum-weight 
drawable by Theorem 1 . □ 




Figure 4: A non-drawable forest-skeleton triangulation 
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4 Tree-Skeleton Triangulation 

In this section, we shall show that there exist tree-skeleton triangulations which are 
minimum-weight non-drawahle further disproving the claim in [12], Let us consider a 
triangulation in the plane with two adjacent triangles, Aahc and Abed, each of which 
has an internal vertex with degree 3, as shown in Figure 5. As agreed previously, 
each of the internal degree-3 vertices can contribute at most one acute angle. In the 
following, we shall prove that if the only acute angle in Aabc, Zaeb, and that in Acbd, 
Zbfd, are facing edge {a,b) and edge (b,d) respectively, then by Lemma 1, the 
triangulation with these two adjacent triangles is not minimum-weight drawable. This 
is because e and / will be on the quadrilateral bfee and edge (e,f) crosses edge (b,c) 
and is shorter than edge (b,c). 




Figure 5: An non-drawable case 



Let us consider a convex polygon P with « > 13 vertices. We shall show that P has at 
least 3 consecutive inner angles with degree > 90°. 

Lemma 3: Any convex polygon cannot have more than 4 acute inner angles. 

Proof: If the convex «-gon has 5 or more acute angles, then the sum of angles is no 
more than 180° («-5) -t 5 X 90°= 180° n- 900° + 450°= 180° n -450° =180°(n-2) 
-90° < 180°(«-2). This contradicts the fact that the sum of inner angles of a convex 
«-gon must be 180°(«-2). □ 

Lemma 4: If P is a convex polygon with n > 13 vertices, then P has at least 3 
consecutive obtuse inner angles. 

Proof: The proof is by contradiction. Assume P has at most 2 consecutive obtuse 
inner angles, then there must exist at least 5 acute inner angles to separate the other 
obtuse inner angles in P for « > 13. This contradicts Lemma 3. □ 




Triangulations without Minimum- Weight Drawing 



171 



I 




Figure 6: A non-drawable tree-/ star-skeleton triangulation 



Theorem 3: There exists a tree-skeleton (star-skeleton) triangulation which does not 
admit a minimum-weight drawing. 

Proof: Refer to Figure 6. By Lemma 4, P contains at least three consecutive obtuse 
inner angles. Without loss of generality, let Za’, Zb’, and Za’bb’ be three 
consecutive obtuse inner angles of P. In order for P to be drawable, the region 
caa’bb’d must also be drawable. It follows that edges (a,b), (b,d), and (b,c) must be 
drawable. Note that Zaeb and Zbfd must be acute since Za' and Zb’ are already 
obtuse. Then, the angle Zbec in triangle Aabc and the angle Zbfc in triangle Abed 
must be obtuse by Property 2. Then, edge (b,c) cannot be an edge in any minimum- 
weight drawing. As the graph is a star-skeleton triangulation and star is a subclass of 
tree, the theorem also applies to the tree-skeleton triangulations. □ 



5 Conclusion 

In this paper, we investigated the minimum-weight drawability of triangulations. We 
show that triangulations with forest-skeletons or even with tree-/star-skeleton are not 
minimum-weight drawable, disproving the conjecture by [12]. Furthermore, we 
found that in addition to wheel graph and k-spined graph, a subclass of star-skeleton 
graph, regular star-graph is minimum- weight drawable. It will be sketched out in the 
following appendix. 



Appendix: Drawable Triangulation with Minimum-Weight 

In this section, we shall show that some special triangulation, namely, regular star 
skeleton graph, admits a minimum- weight drawing. 
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Definition 3: There exist only three types of edges in a triangulation whose skeleton 
is a forest , namely, (1) skin-edge (simply, s-edge): both vertiees of the edge are on 
the hull, (2) tree-edge (simply, t-edge): both vertices of the edge are not on the hull, 
and (3) bridge-edge (simply, b-edge): one vertex of the edge is on the hull and the 
other is not. A base skin-edge is the most 'inner' layer of s-edges. A graph G is a 
regular star skeleton graph, denoted by RSSG, if G has a star skeleton, G contains 
only base skin, and all the b-edges of a branch on the same side are connected to the 
vertex of its neighboring b-edge. 

By the definition, an RSSG can always be decomposed into a wheel and several k- 
spined triangles, where k can be different for different triangles. Each k-spined 
triangle consists of two fans, and the apex of a fan is a vertex of b-edge in the wheel 
and the boundary of the fan is a branch of the star-skeleton. We shall give a high-level 
description of the algorithm. 

Algorithm: The algorithm first identifies if G is an RSSG. If it is, then label the b- 
edges and t-edges for the wheel and the fans of this RSSG. For a given resolution of 
the drawing, we can determine the size of a fan that is a function of the number of 
radial edges, k, and the distance 6 between the apex and its closest boundary vertex. 
Now, the algorithm will draw a wheel. During the arrangement of its radial edges, we 
take the size of the attached fans into a count. There are two types of drawings for 
fans: FANl and FAN2. 

FANl: Let v be the apex of a fan and (v,, v^, .... v^) be the sequence of vertices on its 
boundary (the interior vertices of the (k-l)-spined triangle. The drawing is similar to 
that in [12] (refer to Lemma 7 of [12]). 

FAN2: The apex v' lies on the opposite side of the apex v along (v^ v^, .... v^). Since all 
the edges in FAN2 are stable edges, they belong to MWT(S) [15]. 

Theorem 4. Graph RSSG is minimum-weight drawable. 

Proof Sketch: We shall prove that each edge of the drawing belongs to the MWT of 
this point set. There are three types of edges in the drawings: the base s-edges, the b- 
edges, and the t-edges. The s-edges obviously belong to the MWT of this point set 
since all these edges are on the convex hull of this point set. Let us consider b-edges 
and t-edges. By our algorithm and by Lemma 7 in [12], all individual fans belong to 
their own AfWTs respectively. We can show that they also belong to the final MWT hy 
proving all the b-edges separating them belong to MWT(S) (using the local replacing 
argument [16]). 
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Abstract. Given a boolean 2GNF formula F, the Max2Sat problem 
is that of finding the maximum number of clauses satisfiable simultane- 
ously. In the corresponding decision version, we are given an additional 
parameter k and the question is whether we can simultaneously satisfy 
at least k clauses. This problem is ArP-complete. We improve on known 
upper bounds on the worst case running time of Max2Sat, implying 
also new upper bounds for Maximum Gut. In particular, we give exper- 
imental results, indicating the practical relevance of our algorithms. 
Keywords: AT’-complete problems, exact algorithms, parameterized 
complexity, Max2Sat, Maximum Cut. 



1 Introduction 

The (unweighted) Maximum Satisfiahility problem (MaxSat) is to assign val- 
ues to boolean variables in order to maximize the number of satisfied clauses 
in a CNF formula. Restricting the clause size to two, we obtain Max2Sat. 
When turned into “yes-no” problems by adding a goal k representing the num- 
ber of clauses to be satisfied, MaxSat and Max2Sat are AP-complete [7]. 
Efficient algorithms for MaxSat, as well as Max2Sat, have received consid- 
erable interest over the years [2]. Furthermore, there are several papers which 
deal with Max2Sat in detail, e.g., [3,4,6]. These papers present approximation 
and heuristic algorithms for Max2Sat. In this paper, by way of contrast, we 
introduce algorithms that give optimal solutions within provable bounds on the 
running time. The arising solutions for Max2Sat are both fast and exact and 
show themselves to be interesting not only from a theoretical point of view, but 
also from a practical point of view due to the promising experimental results we 
have found. 

The following complexity bounds are known for Max2Sat: There is a de- 
terministic, polynomial time approximation algorithm with approximation fac- 
tor 0.931 [6]. On the other hand, unless P = NP, the approximation factor 
cannot be better than 0.955 [9]. With regard to exact algorithms, research so 
far has concentrated on the general MaxSat problem [1,12]. As a rule, the al- 
gorithms which are presented there (as well as our own) are based on elaborate 
case distinctions. Taking the case distinctions in [12] further, Bansal and Ra- 
man [1] have recently presented the following results: Let \F\ be the length of 
the given input formula and K be the number of clauses in F. Then MaxSat 
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can be solved in times 0(1.3413^|F|) and 0(1.10581^1). The latter result implies 
that, using |F| = 2K, Max2Sat can be solved in time 0(1.2227^), this being 
the best known result for Max2Sat so far. Moreover, Bansal and Raman have 
shown that, given the number k of clauses which are to be satisfied in advance, 
MaxSat can be solved in O(1.3803^A:^ + \F\) time. 

Our main results are as follows: Max2Sat can be solved in times 
0(1. 09701^1), 0(1.2035^), and 0(1.2886^A:+ |F|), respectively. In addition, we 
show that if each variable in the formula appears at most three times, then 
Max2Sat, still A^P-complete, can be solved in time O(1.2107^|F|). In reference 
to modifications of our algorithms done in [8], we find that Maximum Cut in 
a graph with n vertices and m edges can be solved in time 0(1.3197™). If re- 
stricted to graphs with vertex degree at most three, it can be solved in time 
0(1.5160"'), and, if restricted to graphs with vertex degree at most four, in time 
0(1.7417"). In addition, the same algorithm computes a Maximum Cut of size 
at least k in time 0(m + n + 1.7445^/c), improving on the previous time bounds 
of 0(m + n + 4^/c) [11] and 0(m + n + 2.6196^/c) [12]. 

Aside from the theoretical improvements gained by the new algorithms we 
have developed, an important contribution of our work is also to show the prac- 
tical significance of the results obtained. Although our algorithms are based on 
elaborate case distinctions which show themselves to be complicated upon anal- 
ysis, they are relatively easy to apply when dealing with the number of cases 
the actual algorithm has to distinguish. Unlike what is known for the general 
MaxSat problem [1,12], we thereby have for Max2Sat a comparatively small 
number of easy to check cases, making our implementation practical. Moreover, 
analyzing the frequency of how often different rules are applied, our experiments 
also indicate which rules might be the most valuable ones. Our algorithms can 
compete well with heuristic ones, such as the one described by Borchers and 
Furman [3]. 

Independent from our work, Hirsch [10] has simultaneously developed up- 
per bounds for the Max2Sat problem. He presents an algorithm with bounds 
of 0(1.09051'^!) with respect to the formula length |F| and 0(1.1893^) with 
respect to the number of clauses K , which are better than the bounds shown 
for our algorithms. Moreover, he points out that his algorithm also works for 
weighted versions of Max2Sat. On the other hand, however, he does not give 
any bound with respect to k, the number of satisfiable clauses. His analysis is 
simpler than ours, as he makes use of a result by Yannakakis [15]. The algo- 
rithm itself, however, seems much more complex and is not yet accompanied by 
an implementation. The reduction step of Hirsch’s algorithm has a polynomial 
complexity, as a maximum flow computation has to be done, and it would be 
interesting to see whether this will turn out to be efficient in practice. 

Due to the lack of space, we omitted several details and refer to [8] for more 
material. 
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2 Preliminaries and Transformation Rules 

We use primarily the same notation as in [1,12]- We study boolean formulas in 
2CNF, represented as multisets of sets (clauses). A subformula, i.e., a subset 
of clauses, is denoted dosed if it is a minimal subset of clauses allowing no 
variable within the subset to occur outside of this subset as well. A clause that 
contains the same variable positively and negatively, e.g., {x,x}, is satisfied by 
every assignment. We will not allow for these clauses here, and assume that 
such clauses are always replaced by a special clause T, denoting a clause that 
is always satisfied. We call a clause containing r literals simply an r-clause. 
Its length is therefore r. A formula in 2CNF is one consisting of 1- and 2- 
clauses. We assume that 0-clauses do not appear in our formula, since they are 
clearly unsatisfiable. The length of a clause is its cardinality, and the length of 
a formula is the sum of the lengths of its clauses. Let I be a literal occurring 
in a formula F. We call it an {i, j)-literal if the variable corresponding to I 
occurs exactly i times as I and exactly j times as 1. Analogously, we obtain 
(f+, j)-, (f, j+)-, and , j^yiiterals by replacing “exactly” with “at least” at the 
appropriate positions, and get and {i^ ,j^)~ literals by replacing 

“exactly” with “at most”. Following Bansal and Raman [1], we call an 
literal an ,pi][ni,... ,nj]-literal if the clauses containing I are of 

length Pi <■■■< Pi and those containing I are of length m < . . . < Uj. For a 
literal I and a formula F, let F[l] be the formula originating from F by replacing 
all clauses containing I with T and removing I from all clauses where it occurs. 
We say x occurs in a clause C if xeC or xeC. We write for the number 
of occurrences of x in the formula. Should variable x and variable y occur in the 
same clause, we call this instance a common occurrence and write ifxy for the 
number of their common occurrences in the formula. In the same way, we write 
ffxy for the number of common occurrences of literals x and y. 

As with earlier exact algorithms for MaxSat [1,12], our algorithms are re- 
cursive. They go through a number of transformations and branching rules, 
where the given formula is simplified by assigning boolean values to some care- 
fully selected variables. The fundamental difference between transformation and 
branching rules is that when the former has been given a formula, it is replaced 
by one simpler formula, whereas in the latter a formula is replaced by at least 
two simpler formulas. The asymptotic complexity of the algorithm is governed 
by the branching rules. We will use recurrences to describe the size of the cor- 
responding branehing trees created by our algorithms. Therefore, we will apply 
one of the transformation rules whenever possible, as they avoid a branching of 
recursion. 

In the rest of this section, we turn our attention to the transformation rules. 
Our work here follows that of [12] closely, as the first 4 rules have also been 
used there. Their correctness is easy to check. 

1. Pure Literal Rule: Replace F with F[l] if Hs a (1+, 0)-literal. 

2. Dominating 1-Clause Rule: If I occurs in i clauses and I occurs in at least i 

1-clauses of F, then replace F with F[l]. 
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3. Complementary 1-Clause Rule: If F = {{a;}, {a:}}uG, then replace F with G, 
increasing the number of satisfied clauses by one. 

4. Resolution Rule: If F = {{x, fi}, {x, I 2 }} U G and G does not contain x, then 
replace F with {{^ 1 ,^ 2 }} U G, increasing the number of satisfied clauses by 
one. 

5. Almost Common Clauses Rule: If F = {{x,y},{x,y}} UG, then replace F 
with {x} U G, increasing the number of satisfied clauses by one. 

6. Three Oeeurrence Rules: We consider two subcases: 

(a) If a: is a (2, l)-literal, F = {{a:, y}, {a:, y}, {a;, y}} U G, and G does not 
contain x, then replace F with G, increasing the number of satisfied 
clauses by three. 

(b) If X is (2, l)-literal, and either F = {{a;, y}, {a;, y}, {a;, Zi}} U G or F = 
{{a:, y}, {a;, h}, {x, y}}uG, then replace F with {{y, G}}UG or {{y, h}}U 
G, respectively, increasing the number of satisfied clauses by two. 

The Almost Common Clauses Rule was introduced by Bansal and Raman [1]. 
In the rest of this paper, we will call a formula redueed if none of the above trans- 
formation rules can be applied to it. The correctness of many of the branching 
rules that we will present relies heavily on the fact that we are dealing with 
reduced formulas. 



3 A Bound in the Number of Satisfiable Clauses 

Theorem 1 For a 2CNF formula F, it can be computed in time 0(|F| + 
1.2886^/c) whether or not at least k clauses are simultaneously satisfiable. 

Theorem 1 is of special interest in so-called parameterized complexity theory [5] . 
The corresponding bound for formulas in CNF is 0(|F| + 1.3803^/c^) [1]. In 
this expression 1.3803^ gives an estimation of the branching tree size. The time 
spent in each node of the tree is 0(|F|), which for CNF formulas is shown to 
be bounded by For 2CNF formulas, however, we can improve this factor 

for every node of the tree from k'^ to k: Note that the case where A: < [-y] with 
K as the number of clauses is trivial, since for a random assignment, either this 
assignment or its inverse satisfy |~ -^1 clauses. For k > T-y ] , however, Max2Sat 
formulas have |F| = 0{k). 

Before sketching the remaining proof of Theorem 1, we give a corollary. Con- 
sider a 2CNF input formula in which every variable occurs at most three times. 
This problem is also AF-complete [13], but we can improve our upper bounds by 
excluding some of the cases necessary for general 2CNF formulas, thus obtaining 
a better branching than in Theorem 1. We omit details. 



Corollary 2 For a 2CNF formula F where every variable occurs at most three 
times, it can be computed in time 0(|F| + 1.2107^/c) whether or not at least k 
clauses are simultaneously satisfiable. 
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We now sketch the proof of Theorem 1. We present algorithm A with the given 
running time. As an invariant of our algorithm, observe that the subsequently 
described branching rules are only applied if the formula is reduced, that is, there 
is no transformation rule to apply. The idea of branching is based on dividing the 
search space, i.e. the set of all possible assignments, into several parts, finding an 
optimal assignment within each part, and then taking the best of them. Carefully 
selected branchings enable us to simplify the formula in some of the branches. 
Observe that the subsequent order of the steps is important. In each step, the 
algorithm always executes the applicable branching rule with the lowest possible 
number: 

RULE 1: If there is a (9+, 1)-, (6+, 2)-, or (4+, 3+)-literal x, then we branch 
into F[x] and F[x]. The correctness of this rule is clear. In the worst case, a 
(4, 3)-literal, by branching into E[a;], we may satisfy 4 clauses and by branching 
into F[x], we may satisfy 3 clauses. We describe this situation by saying that 
we have a branching vector (4, 3), which expresses the corresponding recurrence 
for the search tree size, solvable by standard methods (cf. [1,12]). Solving the 
corresponding recurrence for the branching tree size, we obtain here the branch- 
ing number 1.2208. This means that were we always to branch according to a 
(4, 3)-literal, the branching tree size would be bounded by 1.2208^. It is easy to 
check that branching vectors (9, 1) and (6, 2) yield better (i.e., smaller) branching 
numbers. 

RULE 2: If there is a (2, l)-literal x, such that F = {{a:, y}, {x, z}} U G and 
y occurs at least as often in F as z, then branch as follows: If both y and z are 
(2, l)-literals, branch into F[a;] and E[a;]. We can show a worst case branching 
vector of (4,5) in these situations. Otherwise, i.e., if one of y and z is not (2, 1), 
then branch into F[y] and F\y], The correctness is again obvious. However, the 
complexity analysis (i.e., analysis of the branching vectors) is significantly harder 
in this case. Keep in mind that the formula is reduced, meaning that we may 
exclude all cases where a transformation rule would apply. 

First, we distinguish according to the number of common occurrences of 
X and y: Assuming that there are three common occurrences we either have 
clauses {x,y}, {x, y}, clauses {x,y}, {x,y}, or clauses {x, y}, {x,y}, {x,y}. In 
the first two cases, the Almost Common Clause Rule applies (cf. Section 2), and 
in the latter case, the first of the Three Occurrence Rules applies. Analogously, 
assuming two common occurrences, either the Almost Common Clause Rule or 
the second of the Three Occurrence Rules applies. Hence, because the formula 
is reduced, we can neglect these cases. 

It remains to consider only one common occurrence of x and y. We make the 
following observation: By satisfying y, we reduce literal x to occurrence two and 
the Resolution Rule applies, eliminating x and satisfying one additional clause. 
On the other hand, satisfying y leaves a unit occurrence of x and the Dominating 
1-Clause Rule applies, eliminating x from the formula and satisfying the two x- 
clauses. Now we consider each possible occurrence pattern for literal y. If y occurs 
at least four times, it is a (3+, 1)-, (1,3+)-, or a (2+, 2+)-literal, and using the 
given observation, in the worst cases we obtain branching vectors (4,3), (2,5), 
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or (3,4). If y occurs only three times, it is a (2,1)- or (1, 2)-literal. We then 
take a literal z into consideration as well. We know from the way in which y was 
chosen that the literal z is also of occurrenee three. We consider all combinations 
of y and z, which are either (2, 1)- or (1,2), and also cover a possible common 
occurrence of y and 5 in one clause. Branching as specified, we in the worst case 
obtain a branching vector (2, 6), namely when both y and z are (1, 2) and there 
is no common clause of y and z. We omit the details here. 

Summarizing, for RULE 2, the worst observed branching vector is (2,5), 
which corresponds to the branching number 1.2365. 

RULE 3: If there is a (3+,3+)- or (4+, 2)-literal x, then branch into F[x] 
and F[x]. Trivially we get the branching vectors (3,3) and (4,2), implying the 
branehing numbers 1.2600 and 1.2721. 

RULE 4: If there is a (c, l)-literal x with c G {3, 4, 5, 6, 7, 8}, then choose a 
literal y oeeurring in a clause {x,y} and branch into F[y] and F\y], Again, this 
is clearly correct. 

With regard to the complexity analysis, we observe that by satisfying y, a 
unit occurrence of x arises and the Dominating 1-Clause Rule applies, satisfying 
all x-elauses. Having reached RULE 4, we know that all literals in the formula 
oeeur at least four times, as the 3-oeeurrences are eliminated by RULE 2. We 
consider different possible cases for y, namely y being a (3+,l)-, (1,3+)-, or 
(2+, 2+)-literal, and we consider all possible numbers of common occurrences of 
X and y. Using the given observation, we can show a branching vector of (1,6) 
in the worst case, namely for a (3, l)-literal x, a (1, 3)-literal y, and = 1. 
This corresponds to the branching number 1.2852. Again, we omit the details. 

RULE 5: By this stage, there remain only (2,2)-, (3,2)-, or (2, 3)-literals 
in the formula. RULE 5 deals with the ease that there is a (2, 2)-literal x. Our 
branehing rule now is more involved. We choose a literal y oeeurring in a clause 
{x, y} and a literal 2 : oeeurring in a clause {x,z}. For x having at least two 
common occurrences with y or z, we branch into F[x] and F[x]. If this is not 
the ease but y and z have at least two common occurrences, we branch into F[y] 
and F[y]. It remains that = 1, = 1, and < 1- If y and 2 have a 

common occurrence in a clause {y, z}, we branch into F[y], F[yz], and F[yz]. If 
not, i.e. there is no elause {y, z}, we branch into F[yz], F[yz], F[yz], and F[yz]. 
It is easy to verify that we have covered all possible cases. 

Regarding the complexity analysis, we first make use of the following: When- 
ever two literals being (2, 2) or (3, 2) have at least two common occurrences, we 
can take one of them and branch setting it true and false. In the worst case, this 
results in the branehing vector (2,5) with branching number 1.2366. 

Thus, we are only left with situations in which = 1, = 1, and 

#j/z < 1- For these cases, we consider all arrangements of x, y and 2 possible, 
with X being (2,2), y being (2+,2+) and 2 being (2+,2+). We obtain “good” 
branehing numbers of 1.2886 for vectors as, such as (5,6,5, 6) in most cases 
by branching into F[yz], F\yz], F[yF\, and F\yz], Only for a possible common 
occurrence of y and 2 in a clause {y, 2 } would the branching number be worse. 
We avoid this by branehing into E[y], F[yz], and F[yF\ instead. Here, we study 
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in more detail what happens in the single subcases: Setting y true in the first 
subcase of the branching, we satisfy two y-clauses. By setting y false and z true in 
the second subcase, we directly eliminate two y- and two ^-clauses. Consequently, 
the Dominating Unit Clause Rule now applies for x and satisfies two additional 
clauses. In total, we satisfy six clauses in the second subcase. Setting y and 2 ; 
false in the third subcase, we satisfy two y- and two 2 -clauses. In addition, there 
arise unit clauses for x and x such that the Complementary 1-Clause Rule and 
then the Resolution Rule apply, satisfying two additional clauses. Summarizing 
these considerations, the resulting branching vector is (2,6,6) with branching 
number 1.3022. 

For our purpose, this vector is still not good enough. However, we observe 
that in the first branch x, is reduced to occurrence three, meaning that in this 
branch the next rule that will be applied will undoubtly be RULE 2. We recall 
that RULE 2 yields the branching vector (2,5), and possibly even a better one. 
Combining these two steps, we obtain the branching vector (4,7, 6, 6) and the 
branching number 1.2812. 

Note that in RULE 5, we have the real worst case of the algorithm, namely 
for the situation of = 1 and y and 2 having their common 

occurrence in a clause {y, 2 }. For this situation, we can find no branching rule 
improving the branching number 1.2886. 

RULE 6: When this rule applies, all literals in the formula are either (3,2) 
or (2, 3). We choose a (3, 2)-literal x. The branching instruction is now primarily 
the same as in RULE 5 above. However, it is now possible that there is no literal 
2 occurring in a clause {x, 2 }, as the two ^-occurrences may be in unit clauses. 
In this case, i.e. for two x-unit clauses, we branch into F[y] and F[y], Having two 
or more common occurrences for a pair of x, y, and 2 , we branch as in RULE 5. 
For the remaining cases, i.e. = 1, and jj^yi < 1, we branch into 

F[y], F[yzl and F[yz\. 

The complexity analysis works analogously to RULE 5. For ^xy = 1, #xz = 
1, and < 1 we test all possible arrangements of x, y, and 2 with x being 
(3,2) and y and 2 being either (3,2) or (2,3). The worst case branching vector 
in these situations, when branching into F[y], F[yz], and F[yF\, is (2,9,5) and 
yields the branching number 1.2835. Again, we omit the details. 

4 A Bound in the Formula Length 

Compare Theorem 3 with the 0(1.10581^1) time bound for MaxSat [1]. Observe 
that when the exponential bases are close to 1, even small improvements in the 
exponential base can mean significant progress. 

Theorem 3 Max2Sat can be solved in time 0(1.0970l^l). 

We sketch the proof of Theorem 3, presenting Algorithm B with the given run- 
ning time. For the most part, it is equal to Algorithm A, sharing the branching 
instructions of RULEs 1 to 4. Taking up ideas given in [1], we replace RULE 5 
and 6 with new branching instructions RULE 5', 6', 7', and 8'. 
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For the rules known from Algorithm A, it remains to examine their branching 
vectors with respect to formula length. As the analysis is in essence the same 
as that of the proof for Theorem 1, we omit the details once again, while only 
stating that the worst case branching vector with respect to formula length for 
RULEs 1 to 4 is (7,8) (branching number 1.0970), and continue with the new 
instructions: 

RULE 5': Upon reaching this rule, all literals in the formula are of type 

(2.2) , (3,2), or (2,3). RULE 5' deals with the case that there is a (3, 2)-literal 
x, which is not (3, 2) [2, 2, 2] [1, 2]. 

If a; is a (3, 2)[2, 2, 2][2, 2]-literal, we branch into F[x] and F[x]. Counting the 
literals eliminated in either branch, we easily obtain a branching vector of (8,7). 

If X is (3, 2)[2, 2, 2][1, l]-literal with clauses {x,yi}, {x,y 2 }, and {x, ys} in 
which some of yi, y 2 , and ys may be equal, we branch into F[x] and F[a;yiy 2 y 3 ]. 
This is correct, as should we want to satisfy more clauses by setting x to true 
than by setting x to false, all yi, y 2 and ys must be falsified. We easily check 
that if all yi, y 2 , and ys are equal, we obtain a branching vector of (10,10). 
For at least two literals of yi, y 2 , and ys being distinct, we eliminate in the 
first subcase eight literals, namely the literals in the satisfied x-clauses and the 
falsified x-literals. In the second subcase, we eliminate x, having five occurrences 
and two variables having at least four occurrences. This gives a branching vector 
of (8, 13), corresponding to the branching number 1.0866. 

If X is ultimately a (3, 2)[1, 2, 2][2, 2]-literal with clauses {x,z\}, {x,Z 2 } in 
which zi and Z 2 may be equal, we branch into F[a;] and F\xz\Z 2 \. The correctness 
is shown as in the previous case. In the first branch, we directly eliminate eight 
literals. In the second branch, we eliminate literal x having five occurrences and 
at least one literal having four or five occurrences. This gives a branching vector 
of (7,9), corresponding to the branching number 1.0910. 

By using these branching instructions we obtain for RULE 5' the worst case 
branching vector (8, 7) in terms of formula length, namely for a (3, 2)[2, 2, 2][2, 2]- 
literal x. This corresponds to the branching number 1.0970 and will turn out to 
be the overall worst case in our analysis of the algorithm. 

RULE 6': Upon reaching this rule, all remaining literals in the formula are 
either (2, 2), (3, 2)[2, 2, 2][1, 2], or (2, 3)[1, 2][2, 2, 2]. RULE 6' deals with the case 
that there is a (2, 2)[2, 2][1, 2]-literal x, i.e. a (2, 2)-literal having a unit occurrence 
of X. As this rule is similar to RULE 5', we omit the details here and claim a 
worst case branching vector of (5, 12) corresponding to the number 1.0908. 

RULE 7' eliminates all remaining (3, 2)-literals, namely those of type 

(3. 2) [2, 2, 2][1, 2]. We select literals yi, y 2 , ys, and 2 : from clauses {x, yi}, {x, y 2 }, 
{x, ys}, and {x,zj. If there is a variable y which equals at least two of the 
variables yi, y 2 , ys, and z, we branch into subcases F[y] and F[y]. Otherwise, 
i.e. all variables yi, y 2 , ys, and 2 are distinct, we branch into subcases F[yix], 
F[yia;y 2 ys 2 ], and F[yi]. The analysis of this rule is omitted here, as it is in large 
extent analogous to the final RULE 8', which we will study in more detail. 

RULE 8' applies to the (2, 2)[2, 2][2, 2]-literals, which are the only literals 
remaining in the formula. Consider clauses {x, yi}, {x, y 2 }, {x, 21 }, and {x, 22 }. 
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In the case where there is a variable y which equals two of the variables y\, 
IJ2, Zi, or £2, i.e. y has two or more common occurrences with x, we branch 
into F[y] and F[y]. We can easily see how to obtain a branching vector of (8,8) 
and the branching number 1.0906, as setting a value for y implies a value for x. 
Therefore, we proceed to the case of distinct variables y\, 1 / 2 > -Zi, and £2- 

First, we discuss the correctness of the subcases. The correctness of the sub- 
cases F[y\x], F[y\x], F[y\x] and F[y\x] is obvious. Now assume in the second 
branch that a partner of x, e.g. would be falsified. Then, in comparison to 
the first branch, we would lose the now falsified clause {x, Zi}, but could, in the 
best case, gain one additional x-clause. On the other hand, assume that y 2 would 
be satisfied. Then in the second branch, as compared with the first one, we can 
not gain any additional x-clause, but could lose some x-clauses. This shows that 
in the second branch, we can neglect the considered assignments, as they do 
not improve the result obtained in the first branch. Analogously, we obtain the 
additional assignments in the fourth branch and, therefore, branch into subcases 
F[y\x], F[yixy2Z\Z2\, F[y\x\, and F[y\xy2£i£2\- Knowing that all literals in the 
formula are (2, 2)[2, 2][2, 2], we obtain the vector (11, 20, 11, 20). 

As this vector does not satisfy our purpose, we further observe that in branch 
F[yix] and in branch F[yix], there are undoubtly literals reduced to an occur- 
rence of three or two. These literals are either eliminated due further reduction, 
or give rise to a RULE 2 branching in the next step. We check that the worst 
case branching vector in RULE 2 is (7, 10). Combining these steps, we are now 
able to give a worst case branching vector for RULE 8' of (18, 21, 20, 18, 21, 20), 
corresponding to the branching number 1.0958. 

This completes our algorithm and its analysis in terms of formula length. 
Omitting some details, we have shown a worst case branching number of 1 .0970 
in all branching subcases, which justifies the claimed time bound. 

For MaxSat in terms of the number of clauses, the upper time bound 
0(1.3413^|F|) is known [ 1 ]. Setting \F\ = 2 K in Theorem 3, we obtain: 



Corollary 4 Max2Sat can be solved in time 0(1.2035^). 

Using this algorithm we can also solve the Maximum Cut problem, as we 
can translate instances of the Maximum Cut problem into 2CNF formulas [11]. 
In fact, these formulas exhibit a special structure and we can modify and even 
simplify the shown algorithm, in order to obtain better bounds on formulas hav- 
ing this special structure. As shown in [8] on 2CNF formulas generated from 
Maximum Cut instances, Max2Sat can be solved in time 0(1.07181^1) and 
0(|F| + 1.2038^), where k is the maximum number of satisfiable clauses in the 
formula. This implies the bounds for Maximum Cut shown in Theorem 5. Ob- 
serve for part (2) that Maximum Cut, when restricted to graphs of vertex degree 
at most three, is AP-complete [7]. 



Theorem 5 1 . For a graph with n vertices and m edges the Maximum Cut 

problem is solvable in 0(1.3197™) time. 
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2. If the graph has vertex degree at most three, then Maximum Cut can be 
solved in time 0(1.5160"'). If the graph has vertex degree at most four, then 
Maximum Cut can be solved in time in 0(1.7417"). 

3. We can compute in time 0(m + n + /c • 1.7445^/c) whether there is a maximum 
cut of size k. 

5 Experimental Results 

Here we indicate the performance of our algorithms A (Section 3) and B (Sec- 
tion 4), and compare them to the two-phase heuristic algorithm for MaxSat 
presented by Borchers and Furman [3] . The tests were run on a Linux PC with an 
AMD K6 processor (233 MHz) and 32 MByte of main memory. All experiments 
are performed on random 2CNF-formulas generated using the MWFF package 
from Bart Selman [14]. We take different numbers of variables and clauses into 
consideration and, for each such pair, generate a set of 50 formulas. As results, 
we give the average for these sets of formulas. If at least one of the formulas in a 
set takes longer than 48 hours, we do not process the set and indicate this in the 
table by “not run”. Our algorithms are implemented in JAVA. This gives credit 
to the growing importance of JAVA as a convenient and powerful programming 
language. Furthermore, our aim is to show how the algorithms limit the expo- 
nential growth in running time, being effective independent of the programming 
language. The algorithm of Borchers and Furman is coded in C. Coding a sim- 
ple program for Fibonacci recursion in C and JAVA and running it in the given 
environment, we found the C program to be faster by a factor of about nine. 
Due to the different performance of the programming languages, it is difficult 
to only compare running times. As a fair measure of performance we, therefore, 
also provide the size of the scanned branching tree, as it is responsible for the 
exponential growth of the running time. More precisely, for the branching tree 
size we count all inner nodes, where we branch towards at least two subcases. 

There is almost no difference in the performance between algorithms A and B; 
therefore they are not listed separately. This is plausible, as in the processing of 
random formulas, the “bad” case situations in whose handling our algorithms 
differ, are rare. On problems of small size, the 2-phase-EPDL (Extended Davis- 
Putnam-Loveland) algorithm of Borchers and Furman [3] has smaller running 
times despite its larger branching trees. One reason may also be the difference 
in performance of JAVA and C. Nevertheless, with a growing problem size our 
algorithm does a better job in keeping the exponential growth of the branching 
tree small, which also results in significantly better running times, see Table 1. 

In order to gain insight into the performance of our rules, we collected some 
statistics on the application of the transformation and branching rules. For al- 
gorithm B, we examine which rules apply how often during a run on random 
formulas. First, we consider the transformation rules. Note that at one point, 
several transformation rules could be applicable to a formula. Therefore, for judg- 
ing the results, it is important to know the sequence in which the application 
of transformation rules is tested. We show the results in Table 2, with the rules 
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Algorithm B 


2-Phase-EDPL 


n 


m 


Tree 


Time 


Tree 


Time 


25 


100 


16 


0.77 


961 


0.27 




200 


108 


1.78 


37 092 


1.93 




400 


385 


5.41 


514 231 


43.72 




800 


752 


12.59 


2 498 559 


9:16.51 


50 


100 


6 


0.70 


69 


0.48 




200 


320 


4.48 


611 258 


27.11 




400 


18 411 


3:45.80 


not run 


- 


100 


200 


36 


1.14 


10 872 


2.14 




400 


91 039 


23:50.09 


not run 


- 


200 


400 


1 269 


21.87 


not run 


- 



Table 1. Comparison of average branching tree sizes (Tree) and average running 
times (Time), given in minutes: seconds, of our Algorithm B and the 2-phase- 
EDPL by Borchers and Furman. Tests are performed on 2CNF formulas with 
different numbers of variables (n) and clauses (m). 



Variables 






25 






50 


Clauses 


100 


200 


400 


800 


100 


200 


400 


Search Tree Size 


16 


108 


385 


752 


6 


320 


18 411 


Almost Common Cl. 


10 


38 


102 


235 


4 


76 


1 421 


Pure Literals 


26 


82 


111 


99 


34 


658 


11 476 


Dominating 1-Clause 


123 


704 


1 844 


2 839 


42 


3387 


134 425 


Complementary 1-Clause 


40 


571 


2 831 


6 775 


4 


1173 


128 030 


Resolution 


22 


57 


54 


30 


25 


726 


10 821 


Three Occurrences 1 


0 


1 


4 


7 


0 


1 


35 


Three Occurrences 2 


6 


25 


51 


47 


2 


86 


2 810 



Table 2. Statistics about the application of transformation rules in algorithm 
B on random formulas. 



being in the order in which they are applied. Considering the shown and addi- 
tional data, we find application profiles being characteristic for variable/clause 
ratios. We observe for formulas having a higher ratio, i.e. with fewer clauses 
for a fixed number of variables, that the Dominating 1-Clause Rule is the rule 
which is applied most often. With lower ratio, i.e. when we have more clauses 
for the same number of variables, the Complementary 1-Clause Rule gains in 
importance. 

Besides the transformation rules, we also study the frequency in which the 
single branching rules are applied. Recall that algorithm B has a list of eight 
different cases with corresponding branching rules. We show the results col- 
lected during runs on random formulas in Table 3. We observe that the most 
branching steps occur with RULE 1 or RULE 2. The other rules are used in 
less than one percent of the branchings. It is reasonable that in formulas with 
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Variables 




25 






50 


Clauses 


100 


200 


400 


800 


100 


200 


400 


Tree Size 


15.26 


107.4 


384.4 


751.18 


5.26 


319.36 


18 410.38 


RULE 1 


10.62 


102.32 


381.76 


749.42 


0.88 


217.36 


18 300.02 


RULE 2 


4.02 


3.54 


1.18 


0.32 


4.34 


100.12 


97.54 


RULE 3 


0.5 


0.9 


0.94 


0.64 


0 


1.44 


11.14 


RULE 4 


0.08 


0.44 


0.26 


0.36 


0.04 


0.34 


1.22 


RULE 5' 


0.04 


0.12 


0.16 


0.26 


0 


0.08 


0.3 


RULE 6' 


0 


0.06 


0.1 


0.16 


0 


0.02 


0.16 


RULE T 


0 


0.02 


0 


0 


0 


0 


0 


RULE 8' 


0 


0 


0 


0.02 


0 


0 


0 



Table 3. Statistics on the application of branching rules in algorithm B on 
random formulas having n variables and m clauses. Recall that each result is the 
average on 50 formulas to understand that we give non-integer values. Thereby 
we even see the application of very rare rules. 



a high variable/clause ratio, i.e. fewer clauses, we have more variables with an 
oeeurrence of three. Therefore, the rule applied most while processing these for- 
mulas is RULE 2. As the variable/clause ratio shifts down, i.e. when we have 
more clauses for the same number of variables, there neeessarily are more vari- 
ables with a large number of occurrence in the formula. Consequently, RULE 1 
becomes dominating. 

Considering our statistics, we can roughly conclude: Some of the transforma- 
tion rules are, in great part, responsible for the good practical performance of 
our algorithms, as they help to decrease the search tree size. The less frequent 
transformation rules and the rather complex set of branching rules, on the other 
hand, are mainly important for guaranteeing good theoretical upper bounds. 

6 Open Questions 

There remains the option of investigating exact algorithms for other versions 
of Max2Sat, for example, MaxSSat. Eurthermore, n being the number of 
variables, can Max2Sat be solved in less than 2"' steps? Regarding Hirsch’s 
reeent theoretical results [10], it seems a promising idea to combine our algorithm 
with his, in order to improve the upper bounds for Max2Sat even further. 
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Abstract. In this paper, we propose an improved algorithm for dynam- 
ically maintaining the widest k-dense corridor as proposed in [6]. Our 
algorithm maintains a data structure of size 0{ri), where n is the num- 
ber of points present on the floor at the current instant of time. For each 
insertion/deletion of points, the data structure can be updated in 0(n) 
time, and the widest fc-dense corridor in the updated environment can 
be reported in 0(kn -\- nlogn) time. 



1 Introduction 

Given a set S' of n points in the Euclidean plane a corridor C is defined as an 
open region bounded by parallel straight lines and such that it intersects the 
convex hull of S [3]. The width of the corridor C is the perpendicular distance 
between the bounding lines I' and I" . The corridor is said to be k-dense if C 
contains k points in its interior. The widest /c-dense corridor through S is a 
k-dense corridor of maximum width [1]. See Figure 1 for illustration. 

The widest empty corridor problem was first proposed by [3] in the context of 
robot motion planning where the objective was to find an widest straight route 
avoiding obstacles. They also proposed an algorithm for this problem with time 
and space complexities O(n^) and 0{n) respectively. The widest /c-dense corri- 
dor problem was introduced in [1] along with an algorithm of time and space 
complexities O(n^logn) and (n^) respectively. Here the underlying assumption 
is that the robot can pass through (or in other words, can tolerate collission 
with) a specified number (/c) of obstacles. In [4], the space complexity of the 
widest /c-dense corridor problem was improved to 0(n). In the same paper, they 
have suggested an O (nlogn) time and O(n^) space algorithm for maintaining the 
widest empty corridor where the set of obstacles is dynamically changing. How- 
ever, the dynamic problem for general /c (> 0), was posed as an open problem. 
In [6], both the static and dynamic versions for the k-dense corridor problem are 
studied. The time and space complexities of their algorithm for the static version 

* This work was done when the author was visiting School of Information Science, 
Japan Advanced Institute of Science and Technology, Ishikawa 923-1292, Japan. 
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Fig. 1. Two types of corridors 



of the problem are O(n^) and O(n^) respectively. For the dynamic version, their 
algorithm is the pioneering work. Maintaining an O(n^) size data structure they 
proposed an algorithm which reports the widest /c-dense corridor after the inser- 
tion and deletion of a point. The time complexity of their algorithm is 0(/Clogn), 
where 0(/C) is the combinatorial complexity of (< k)-level of an arrangement of 
n half-lines, each of them belongs to and touching the same side of a given line. 
They proved that the value of 1C is 0{kn) in the worse case. 

In this paper, we improve the time complexity of the dynamic version of the 
widest /c-dense corridor problem. Given O(n^) space for maintaining the data 
structure, our algorithm can update the data structure and can report the widest 
k-dense corridor in 0(/C+nlogn) time. As it is an online algorithm, this reduction 
in the time complexity is definitely important. 



2 Geometric Preliminaries 

Throughout the paper, we assume that the points in S are in general position, 
i.e., no three points in S are collinear, and the lines passing through each pair 
of points have distinct slope. Theorem 1, stated below, characterizes a widest 
corridor among the points S. 

Theorem 1. [1,3, 4, 6] Let C* be the widest eorridor with bounding lines k! and 
. Then C* must satisfy the following eonditions : 

(A) i' touches two distinct points pi and pj of S and I" touches a single point 
Pm of S, or 

(B) I' and £" contain points pi and pj respectively, such that I' and i" are per- 
pendicular to the line through pi and pj . 

From now onwards, a /c-dense corridor satisfying conditions (A) and (B) will be 
referred to as type- A and type-B corridors respectively (see Figure 1). 
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2.1 Relevant Properties of Geometric Duality 



We follow the same tradition [1,3, 4, 6] of using geometric duality for solving this 
problem. It maps (i) a point p = (a, 6) to the line T>{p) : y = ax — b m the dual 
plane, and (ii) a non- vertical line £ : y = mx — c to the point T>{£) = (m, c) in 
the dual plane. Needless to say, a point p is below (resp., on, above) a line £ in 
the primal plane if and only if T>{p) is above (resp., on, below) T>{[) in the dual 
plane. A line passing through two points p and q in the primal plane, corresponds 
to the point of intersection of the lines T>{p) and T>{q) in the dual plane, and 
vice versa. 

For the /c-dense vertical corridors, we can not apply geometric duality theory, and 
so we apply the vertical line sweep technique. We maintain a balanced binary leaf 
search tree, say BB{a) tree, with the existing set of points in the primal plane. 
Here each point in S appears at the leaf level, and is attached with the width of 
the widest k-dense vertical corridor with its left boundary passing through that 
point. At each non-leaf node, we attach the width of the widest vertical /c-dense 
corridor in the subtree rooted at that node. It can be easily shown that for each 
insertion/deletion of a point, the necessary updates in this data structure and 
the reporting of widest /c-dense vertical corridor can be done in 0{k + logn) time. 
Below, we concentrate on studying the properties of the non-vertical corridors. 

Consider the two bounding lines £' and t" of a corridor C in the primal plane, 
which are mutually parallel. The corresponding two points, T>{£') and T>{£") 
in the dual plane, will have the same x-coordinate. Thus a corridor C will be 
represented by a vertical line segment joining 'D{£') and 'D(£") in the dual plane, 
and will be denoted by ViC). The width of C is li^(PC'))-?^(P(C))l be 

referred as the dual distance between the points 'D{£') and T>{£''). Here x{p) and 
y{p) denote the x- and y-coordinates of the point p respectively. 

Let H = {hi = T){pi) I Pi e S'} be the set of lines in the dual plane corresponding 
to the n points of S in the primal plane. Let p be a point inside the corridor C. In 
the dual plane, the points !){£') and T>{£") will lie in the opposite sides of the line 
R(p). Now, we have the following observation, which is the direct implication of 
Theorem 1. The dual of a k-dense corridor is characterized in Observation 2. 



Observation 1 Let C he a corridor bounded by a pair of parallel lines £' , £" . 



Now, if C is a type- A corridor, I' passes through pi and pj, and £" passes 
through pm- This implies that T>{£') corresponds to a vertex of A{H), whieh 
is the point of interseetion of hi and hj (denoted by hif^hj), and T>{£") 
corresponds to a point on hm satisfying xCD(£')) = x{hi f]hj). 

If C is a type-B corridor, I' and I" pass through the two points pi and pj 
respeetively. This implies, T>{£') and T>{£") will eorrespond to the two points 
on hi andhj respeetively, satisfying x(T>{£')) = x{T>{£")) = —{l/x{hi{^hj)). 
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Thus, A non-vertical type-A corridor may uniquely correspond to a vertex of 
A{H), and a non-vertical type-B corridor may also uniquely correspond to an 
edge of A{H), on which its upper end point lies. 

Observation 2 A corridor C is said to he k-dense if and only if there are exactly 
k lines of H that intersect the vertical line segment T>{C), representing the dual 
of the corridor C , and will be commonly referred to as a k-stick. 

Thus, recognizing a widest /c-dense non-vertical corridor in the primal plane is 
equivalent to finding a k-stick in the dual plane having maximum dual length. 



3 Widest Non-vertical fc-Dense Corridor 

We now explain an appropriate scheme for maintaining the widest non-vertical 
k-dense corridor dynamically. Let A{H) denote the arrangement of the set of 
lines H [2]. The number of vertices, edges and faces in A{H) are all O(n^). In 
the dynamic scenario, we need to suggest an appropriate data structure which 
can be updated for insertion/deletion of points, and the widest /c-dense corridor 
can be reported efficiently in the changed scenario. As the deletion is symmetric 
to insertion, we shall explain our method for insertion of a new point in S only. 



3.1 Data Structures 

We dynamically maintain the following data structure which stores the arrange- 
ment of the lines in H . It is defined using the concept of levels as stated below. 




Fig. 2. Demonstration of levels in an arrangement of lines 

Definition 1. [2] A point tt in the dual plane is at level 0 {0 < 0 < n) ii there 
are exactly 0 lines in H that lie strictly below tt. The <l-level of A{H) is the 
closure of a set of points on the lines of H whose levels are exactly 0 in A{H), 
and is denoted as Lg{H). See Figure 2 for an illustration. 
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Clearly, Lq{H) is a polychain from x = — oo to a; = oo, and is monotone in- 
creasing with respect to x-axis. The vertices on Lg{H) is precisely the union of 
vertices of A{H) at levels 0—1 and 0. The edges of Lg{H) are the edges of A{H) 
at level 0. In Figure 2, a demonstration of levels in the arrangement A{H) is 
shown. Here the thick chain represents Li{H). Among the vertices of Li{H), 
those marked with empty (black) circles are appearing in level 0 (2) also. Each 
vertex of the arrangement A{H) appears in two consecutive levels, and each edge 
of A{H) appears in exactly one level. We shall store Lg{H), 0 < 0 < n in a, data 
structure as described below. 

tetieZ-structure 

It is an array of size n, called primary structure, whose 6-ih element is com- 
posed of the following fields : 

level-id : an integer containing the level-id 0. 

left-prt : pointing to the left most node of the secondary structure Tg. 
root-ptr : pointing the root node of the secondary structure Tg. 
list-ptr : pointing to a linear link list, called TEMP-list, whose each element is 
a tuple {£, r) of pointers. The TEMP-list data structure will be explained 
after defining the secondary structure. 

The secondary structure at a particular level 6, denoted as Tg, is organized 
as a height balanced binary tree (AVL-tree). The nodes of this tree correspond 
to the vertices and edges at level 0 in left to right order. In addition, each node 
is attached with the following information. 

Two integer fields, called LEN and MAX, are attached with each node. The 
LEA-field contains the dual length of the k-stiek attached to it. Here we ex- 
plicitly mention that, if a node corresponds to a vertex of the arrangement, 
it defines at most one k-stiek, but if it corresponds to an edge, more than 
one k-stieks may be defined by that edge. In that case, the LEA-field will 
contain the length of the one having maximum length among them. A node 
(corresponding to an edge) defining no k-stiek will contain a value 0 in its 
LEA-field. The MAA-field contains the maximum value of the LEN fields 
among all the nodes in the subtree rooted at that node. This actually indi- 
cates the widest one among all the /c-dense corridors stored in the subtree 
rooted at that node. 

Apart from its two child pointers, each node of the tree has three more pointers. 
parent pointer : It helps in traversing the tree from a node towards its root. 
The parent pointer of the root node points to the corresponding element 
of the primary structure. 

neighbor-pointer : It helps the constant time access of the in-order successor 
of a node. 

self-indicator : As an element representing a vertex appears in the secondary 
structure (T) of two consecutive levels, each of them is connected with 
the other using this pointer. 
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By Observation 1 and succeeding discussions, a type-A k-dense corridor corre- 
sponds to a vertex of the arrangement. A vertex v G -A{H) appearing in levels, 
say 0 and 0+1, may correspond to at most two /c-sticks (corresponding to two 
different type-A /c-dense corridors), whose one end point is positioned at vertex 
V, and their other end points lie on some edge at levels 0 — k — l {\i 0 — k — l > 0) 
and 0 + k + 2{\i0 + k + 2<n) respectively, and are attached to the vertex v 
appearing in the corresponding levels. An edge e appearing at level 0 stores at 
most one k-stick which is defined by it and another edge m [0 — k — l)-th level 
and appears vertically below it. 

TEMP-Wst : After the addition of a new point p in S, its dual line h = T>{p) 
is inserted in A{H) to get an updated arrangement A{H'), where H' = H[Jh. 
This may cause redefining the k-sticks of some vertices and edges of A{H'). 

In order to store this information, we use a linear link list at each level 9 of the 
primary structure. Each element of this list is a tuple (i,r). Here £ and r points 
to two elements (vertex/edge) at level 0, and the tuple {£, r) represents a set of 
consecutive elements (vertices/edges) in Tq such that the k-sticks defined by all 
the vertices and edges in that set has been redefined due to the appearance of 
the new line h in the dual plane. Note that. 

The list attached to a particular level, say 0, of the arrangement may contain 
more than one tuple after computing the k-sticks at all the vertices and 
edges of A{H) affected by the inclusion of h. In that case, the set of elements 
represented by two distinct tuples, say (fi,ri) and {£ 2 + 2 ) in that list must 
be disjoint, i.e., ri < £ 2 - Moreover, the elements represented by ri and £2 
must not be consecutive in Tg. 

£-list : We shall use another temporary linear link list (£) during the processing 
of an insertion/deletion of a line h in the arrangement A{H). This contains the 
intersection points of h with the lines in in a left to right order. 



3.2 Updating the Primary Structure 

We first compute the leftmost intersection point of the newly inserted line h with 
the existing lines in H by comparing all the lines. Let the intersection point be a 
and the corresponding line be hi. In order to find the edge e* G A{H) on which 
a lies, we traverse along the line hi from its left unbounded edge towards right 
using the neighbor-pointers and self-indicators. 

Next, we use parent pointers from the edge e* upto the root of Tg* and finally, the 
parent pointer of the root points to the primary structure record corresponding 
to the level 0*. The level of the left unbounded edge e on the newly inserted 
line h in the updated structure A{H') will be 6 (= 0* or {9* + 1)) depending on 
whether h intersects e* from below or from above. 

We replace the old primary structure by a new array of size n + 1, and insert 
a new level corresponding to the left unbounded edge e in appropriate place. 
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Moreover, the list.ptr for all the levels are initialized to NULL. It is easy to see 
that the updating of the primary structure requires 0{n) time. 

Our updating of the secondary structure will be guided by two pointers, PI and 
P2, which will initially contain the edges at the and which are just 
below and above a respectively. 



3.3 Updating the Secondary Strncture 

Let the level of e (the unbounded portion of h to the left of a) be 0 in A{H'), 
and the edge e*(c A{H)), which is intersected by h, be at level 0* {= 0 — 1 or 
6 + 1). In this subsection, we describe the creation of the new edges and vertices 
generated due to the inclusion of h in A{H). 

The portions of h to the left and right of a are denoted as e and e' respectively, 
and the portions of e* to the left and right of the point a by e*ieft and e*right 
respectively. Note that, the vertex a appears in both the levels 6 and 6*. Next, 
we do the following changes in the secondary structure for the inclusion of the 
new vertex a and its adjacent edges in the ^afteLstructure. Refer to Figure 3. 




Fig. 3. Processing a new vertex of A{H') 



e and the vertex a are added in Tq. 

e*ieft remains in its previous level 9* , so e* is replaced by e*ieft in Tg*. 
e*right gocs to level 6. So, first of all e*right is added to Tg. 

Let V be the vertex at the right end of e* (recently modified to e*ieft) in Tg*. 
The tree Tg* is split into two height balanced trees, say Tr and 7 r, where 
7 r contains all the elements (vertices and edges) in that level to the right of 
V including itself, and 7 r contains all the elements in the same level to the 
left of e*ieft and including itself. This requires O(logn) time [5]. 

Next we concatenate Tr to the right of e* right in Tg. The neighbor-pointer of 
e*right is immediately set to point v, and the parent-pointers of the affected 
nodes are appropriately adjusted. This can be done in O(logn) time [5]. 
Finally, the vertex a is added in 7 r as the right most element, and 7 r is 
renamed as Tg*. Note that a portion of e' will be the right neighbor of the 
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vertex a in Tq* . Now, if we have already considered all the n newly generated 
vertices, the right end of e' will be unbounded. In that case, e' is added in 
Tq* as the rightmost element, and its neighbor pointer and parent pointer are 
appropriately set. Otherwise, the right end of e* is yet to be defined, and its 
addition in 7^* is deferred until the detection of the next intersection. 

In the former case, the updating of the secondary structure is complete. But in 
the later case, we proceed with e', the portion of h to the right of a. First of all, 
we set e to e' . Now, (i) \i 6* = 0 — 1, then P2 is set to e*right and PI needs to 
be set to an appropriate edge in level 0 — 2, and (ii) if <1* = <1 + 1, then PI is set 
to e* right and P2 needs to be set to an appropriate edge in level 0 + 2. Finally, 
the current level 0 is set to 0* , and proceed to detect next intersection. 

Now two important things need to be mentioned. 

• For all newly created edges/ vert ices, we set the width of the /c-dense corridor 
to 0. They will be computed afresh after the update of the lewel-structure. 

• During this traversal, we create C with all the newly created edges and 
vertices on /i in a left-to-right order. The edges are attached with their 
corresponding levels. As the newly created vertices appear in two consecutive 
levels, they show their lower levels in the C list. 



Lemma 1. The time required for eonstrueting A{H') from the existing A{H) is 
O(nlogn) in the worst case. □ 



3.4 Computing the New fc-Dense Corridors 

We now describe the method of computing all the k-stieks which intersect the 
newly inserted line h. The C list contains the pieces of the line h separated by 
the vertices in {A{H') — A{H)), which will guide our search process. We process 
edges in the C list one by one from left to right. For each edge e G £, we locate 
all the vertices and edges of A{H') whose corresponding k-stieks intersect e. 

We proceed with an array of pointers P of size 2k + 3, indexed by —{k + 
1) ... 0 ... (A: + 1). Initially, P(0) points to the leftmost edge in the £ list. If 
its level in the /et;el-structure is 0 then P(— 1) . . .P{—k — 1) will point to the 
leftmost edges at levels {0 — k — 1) . . . {0 — 1), and P(l) . . . P{k + 1) will point to 
the leftmost edges at levels + 1) . . . + A: + 1) in the level-structure. 

While processing an edge e G £ (not the left-most edge), P(0) points to the edge 
e; P(— 1) . ■ .P{—k — 1) point to A: -f 1 edges below the left end vertex of e and 
P(l) . . . P{k + 1) point to A: -f 1 edges above the left end vertex of e. At the end 
of the execution of edge e, if e is not right unbounded, we set P(0) to the next 
edge of e in £ and proceed. Otherwise, our search stops. 

In order to evaluate all the k-stieks intersecting e and having its bottom end at 
level i {i = 0 — k — 1, ... ,0), we need to consider the pair of levels {i,i + k + 1). 
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Consider a x-monotone polygon bounded below (resp. above) by the x-nionotone 
chain of edges and vertices at level i (resp. i + k + 1 ) and by two vertical lines at 
the end points of e (see Figure 4). This can easily be detected using the pointers 
P{i — 6 ) and P{i + k+l — 6 ). The LFlAl-fields of all the edges and vertices in the 
above two x-monotone chains are initialized to zero. We draw vertical lines at 
each vertex of the upper and lower chains, which split the polygon into a number 
of vertical trapezoids. 




Fig. 4. Computation of type- A and type-B corridors while processing an edge 
e e £ 

Each of the vertical lines drawn from the convex vertices defines a type- A k- 
stiek. Its dual distance is put in the corresponding node of 7) or . 

In order to compute the type-B k-stieks, consider each of the vertical trapezoids 
from left to right. Let A = V1V2W2W1 be such a trapezoid whose viWi and 
V2W2 are two vertical sides, / denote the x-range of A. Let V1V2 be a portion 
of an edge e* , which in turn lies on a line h* E H' , and w\W2 lies on h** G H' . 
We compute —l/x{h* P| h**) and check whether it lies in J. If so, the vertical 
line at a; = —l/x{h*f]h**), bounded by V1V2 and W\W2, indicates a type- 
B k-stiek corresponding to the edge e*. We compute its dual distance] this 
newly computed k-stiek replaces the current one attached with e* provided 
the dual length of the newly computed k-stiek is greater than the LEN-field 
attached with e* in the data structure 7). 

Let £i and n (resp. (i+k+i and Ti+k+i) denote the edges at level i (resp. f + Zc + l), 
which are intersected by the vertical lines at the left and right end points of the 
edge e(G £) respectively. Note that, the definition of k-stieks for the edges and 
vertices of the f-th level between and may have changed due to the presence 
of e e So, we need to store the tuple (£i,ri) in the TEMP-list attached 

to level i of the primary structure. But, before storing it, we need to check the 
last element stored in that list, say (f*,r*). If the neighbor pointer of r* points 
to £i in 7) (the secondary structure at level i), then {£*,ri) is a continuous set of 
elements in level i which are affected due to the insertion of h. So, the element 
(£*,r*) is updated to {£*,ri); otherwise, (£*,r*) is added in the TEMP-list. We 
store (fi+fe+i , Ti+k+i ) in the TEMP-list attached to level f + /c + 1 of the primary 
structure in a similar way. 
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Next, we may proceed by setting P(0) to the next edge e' of £ list. Each of the 
pointers P{i),i = —k — 1, . . . , fc + 1, excepting P(0), need to point to an edge 
either at level {i + 6 —1) or at level {i + 6+1) which lies just below or above the 
current edge pointed by P{i) in the leuel-structure, depending on whether e' lies 
at level <1 — 1 or <1 + 1 in the level structure. From the adjacencies of vertices and 
edges in the /euel-structure, this can be done in constant time for each P(i). 

Theorem 2. The time required for computing the k-sticks intersecting the line 
h in A{H') is 0{nk). 

Proof : Follows from the above discussions, and the fact that the complexity of 
the < /c-levels of n half-lines lying above (below) the newly inserted line h in the 
arrangement A{H') is 0{nk). [6] □ 



3.5 Location of Widest fc-Dense Corridor 

The TPMP-list for a level, say 0, created in the earlier subsection, is used to 
update the MAX-field of the nodes of the tree Tg, by considering its elements 
in a sequential manner from left to right. Let the tuple {£, r) be an entry of the 
TEMP -list at the level 6 . Let q be the common predecessor of the set of nodes 
represented by the tuple {£,r). Let Pin be a path from the root of Tg to the 
node q, and Pl and Pr be two paths from q to £ and g to r respectively. In Tg, 
the MAX- fields of all the nodes in the interval (f, r), and the set of nodes in Pin, 
Pl and Pr may be changed. So, they need to be inspected in order to update 
the MAX-fields of the nodes in Tg. Now we have the following lemma. 

Lemma 2. For each entry (£,r) of the TEMP-list of a level, say 9, the number 
of nodes ofTg which need to he visited to update the MAX-fields is 0{logn + x), 
where x is the number of consecutive vertices and edges of the arrangement at 
level 9 represented by (£,r). □ 

In order to count the total number of elements attached to the TEMP-list at 
all the levels let us consider a n x n square grid A4 whose rows represent the n 
levels of the arrangement and its each column represents an edge on h in left to 
right order (See Figure 5). Consider the shaded portion of the grid; observe that 
its f-th column spans from row 9 — k — 1 to 9 k + 1 , where 9 is the level of 
in A{H'). This corresponds to the levels which are affected by e^. The shaded 
region is bounded by two x-monotone chains. Now, let us define a horizontal 
strip as a set of consecutive cells on a row which belong to the shaded portion 
of the grid. A horizontal strip which spans from the first to the last column of 
the grid, is referred to as long strip. The strips which are not long, are called 
short strips. Note that, each strip attached to a row represents an element of 
the TEMP-list attached to the corresponding level. It is easy to observe that the 
number of such short strips is 0(n), and the number of such long strips may be 
at most 2/c — 3 in the worst case. Now we have the following lemma. 
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Fig. 5. Grid M. estimating the number of elements in the TEMP list 



Lemma 3. In order to update the MAX- field of the nodes of the secondary struc- 
ture, the tree traversal time for an entry of any of the TEMP-lists is 0(x + logn) 
if the eorresponding strip is short, and is 0{x) if the strip is long, where x> the 
length of the strip, is the number of nodes represented by the corresponding entry 
of the TEMP-list. 

Proof : For the short strips, the result follows from Lemma 2. For the long strips, 
all the nodes of the corresponding tree is affected. So, x subsumes O(logn). □ 

Finally, the roots of the trees at all levels need to be inspected to determine the 
widest k-dense corridor. 



3.6 Complexity 

Theorem 3. Addition of a new point requires 

(a) O(nlogn) time for updating the data structure. 

(b) 0{nk) time to compute the k-dense corridors containing that point. 

(c) 0{nk + nlogn) time to traverse the trees attached to different levels of the 

secondary structure for reporting the widest k-dense corridor. □ 

As we are preserving all the vertices and edges of the arrangement of the dual 
lines in our proposed leuel-structure, the space complexity is O(n^), where n is 
the number of points on the floor at the current instant of time. 
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Abstract. The problem of reconstructing a discrete set from its X-rays 
in a finite number of prescribed directions is NP-complete when the 
number of prescribed directions is greater than two. In this paper, we 
consider an interesting subclass of discrete sets having some connectivity 
and convexity properties and we provide a polynomial-time algorithm 
for reconstructing a discrete set of this class from its X-rays in directions 
(1, 0), (0, 1) and (1, 1). This algorithm can be easily extended to contexts 
having more than three X-rays. 

keywords: algorithms, combinatorial problems, discrete tomography, 
discrete sets. X-rays. 



1 Introduction 

A discrete set is a finite subset of the integer lattice and can be represented 
by a binary matrix or a set of unitary squares. A direction is a vector of the Eu- 
clidean plane. If u is a direction, we denote the line through the origin parallel 
to u hy lu- Let F be a discrete set; the X-ray of F in direction u is the function 
XuF, defined as: XuF{x) = |Fn(a; + f„)| for x e u^, where is the orthogonal 
complement of u (see Eig. 1). The function XuF is the projection of F on 
counted with multiplicity. The inverse problem of reconstructing a discrete set 
from its X-rays is of fundamental importance in fields such as: image processing 
[17], statistical data security [14], biplane angiography [16], graph theory [1] and 
reconstructing crystalline structures from X-rays taken by an electron micro- 
scope [15]. An overview on the problems in discrete tomography and a study 
of the complexity can be found in [12] and [7]. Many authors have studied the 
problem of determining a discrete set from its X-rays in both horizontal and 
vertical directions. Some polynomial algorithms that reconstruct some special 
sets having some convexity and/or connectivity properties, such as horizontally 
and vertically convex polyominoes [3,4], have been determined. 
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Fig. 1. X-rays in the directions Ui = (1,0), U 2 = (0,1) and W3 = (1, 1). 



In this paper, we study the reconstruction problem with respect to three 
directions: (1,0), (0,1) and (1,1). We denote the X-rays in these directions 
by H, V and D, respectively. The basic question is to determining if, given 
H e N™, F e N*" and H e a discrete set F whose X-rays are {H, V, D) 

exists. Let {ui, . . . , Ufe} be a finite set of prescribed directions. The general prob- 
lem can be formulated as follows: 

Consistency(ui, . . . , Uk) 

Instance: k vectors X \, . . . , Xk- 

Question: is there F such that Xu^F = Xi for i = 1, . . . , fc? 

Gardner, Gritzmann and Prangenberg [8] proved that Gonsistency((l , 0), (0, 1), 
(1, 1)) is NP-complete in the strong sense. Then, by means of this result and two 
lemmas, they proved that the problem Gonsistency(ui, . . . , Uk) is NP-complete 
in the strong sense, for k >3. 

In this paper, we determine a class of discrete sets for which the problem 
is solvable in polynomial time. These sets are hex-connected and convex in the 
horizontal, vertical and diagonal directions. They can be represented by a set 
of hexagonal cells. By exploiting the new geometric properties of these struc- 
tures, we can provide an algorithm which finds a solution in 0(n®), where we 
assume n = m. The algorithm can be easily extended to contexts having more 
than three X-rays and can reconstruct some discrete sets that are convex in the 
directions of the X-rays. We wish to point out that the question of determining 
when planar convex bodies can be reconstructed from their X-rays was raised by 
Hammer [6,11] in 1963. The discrete analogue of this question raised by Gritz- 
mann [10] in 1997 is an open problem. We believe that our algorithm can be 
considered to be an initial approach to this problem in so far as it reconstructs 
a discrete set which is convex in the directions of the X-rays. 

2 Definitions and Preliminaries 

We now wish to examine an interesting class of discrete sets for which the prob- 
lem can be solved in polynomial time. We introduce some definitions which allow 
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us to characterize this class. Let us take the triangular lattice made up of direc- 
tions: (1,0), (0, 1) and (1, 1) into consideration. A point Q = {i,j) of this lattice 
has 6 neighbours and can be represented by a hexagonal cell (see Fig. 2). 




Fig. 2. The 6-neighbours of Q = 



Definition 1. If F is a discrete set, a 6- path from P to Q in F is a sequence 
P\,...,Ps of points in F, where P = P\, Q = Ps and Pt is a 6-neighbour of 
Pt-i, fort = 2,...,s. 

It can be noted that the sequence with s = 1 is also a 6-path. 

Definition 2. F is hex-connected if, for each pair of F points, there is a 6-path 
in F that eonnects them. 

Finally, 

Definition 3. A hex-connected set is horizontally, vertically and diagonally con- 
vex if all its rows, columns and diagonals are hex-connected. 

We denote the class of hex-connected and horizontally, vertically and diagonally 
convex sets by T . An element of this class corresponds to a convex polyomino 
with hexagonal cells. Fig. 3 shows a hex-connected discrete set and a discrete set 
of JTwith its corresponding hexagonal convex polyomino. This class of hexagonal 
polyominoes was studied by some authors in enumerative combinatorics [5,18]. 

3 The Reconstruction Algorithm 

Let us now consider the reconstruction problem applied to class T . Given H = 
{hi, . . . , hm), V = (vi, . . . , Vn) and D = {di, . . . , dn+m-i), we want to establish 
the existence of a set F e Fsuch that X(^i^q)F = H, A(op)F = V, X(i^i)F = D. 
A necessary condition for the existence of this set is: 



m n n+m— 1 

i=l j=l k=l 



(3.1) 
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Fig. 3. a) A hex-connected set. b) A set oi J- . c) Its corresponding hexagonal 
convex polyomino. 



Without loss of generality, we can assume that hi ^ Q for i = 1, ... ,m, and 
Vj yf 0 for j = 1, . . . ,n. From this assumption and the definition of JT, it follows 
that there are two integers h and I 2 such that: 

I < li < I 2 < n + m — 1, 

dk ^ Ofov k = h, . . . ,l 2 , (3.2) 

dk = 0 for /c = 1, . . . , — 1, ^2 + 1, . . . , n + m — 1; 

this in turn means that F is contained in the discrete hexagon: 

A={(i,j)GN^: 1 < f < m, 1 < j < n, i j — 1 < ^ 2 }- 

A is the smallest discrete hexagon containing F. We wish to point out that 
the X-ray vector components are numbered starting from the left-upper corner 
(see Fig. 3b)). We call bases of F the points belonging to the boundary of A 
(see Fig. 3c)). Our aim is to determine which points of A belong to F. Let 
Q = ihj) E A] this point defines the following six zones (see Fig. 4): 




Fig. 4. The six zones determined by Q = (*,j). 



Reconstruction of Discrete Sets from Three or More X-Rays 



203 



^i(Q) = {(r, c) E A : r < i and c < j}, 

^2(Q) = {(f, c) E A : j <c and r + c<i+ j}, 

^3{Q) = {(f: c) E A : r < i and i + j < r + c}, 

Zi{Q) = {{r,c) E A : i <r andj < c}, 

ZbiQ) = {(f, c) E A : c<j and i+ j <r + c}, 
c) E A : i < r andr + c < i + j} 

Each zone contains Q. Moreover, from the definition of the class T , it follows 
that, if Q does not belong to F, there are two consecutive zones which do not 
contain any point of F . For example, the point Q in Fig. 4 does not belong to F, 
and the intersection between F and Z5(Q) U Ze(Q) is the empty set. By setting 
Zk(Q) = Zk(Q) UZk+i(Q), with k = and Ze(Q) = Zq(Q) U Zi{Q), we 

obtain: 

Property 1 . Let Q be a point of the smallest discrete hexagon A containing a 
discrete set F of F . The point Q E F if and only if Zk(Q) n F yf 0, for each 
1 <k< 6 . 

We can determine some F points just by referring to the geometry of A. Let 
Ii = (h - n + l,n), h = ih, 1), Ji = F = {m^h - m + 1), Ki = 

(m, 1), K2 = (l,n). These points are the vertices of A as shown in Fig. 5. Let 




Fig. 5. The points Qi and Q2 belonging to every discrete set of F contained in 
hexagon A. 



it,jt and kt be the row, column and diagonal index containing It, Jt and Kt, 
respectively, with t = 1, 2. Note that, 12 = ji = h-nl-l, ]2 = h — m+l, 

ki = m and k2 = n. The hexagon illustrated in Fig. 5 is such that: ii > ^2, ji < 
j2 and ki > k2- Since A is the smallest discrete hexagon containing F, then the 
sides I2J1 and J1K2 contain at least a base of F. So, if Q = (i,j) E A is such 
that i >i2 or k > k2, with k = i + j — I, then Zi{Q) n F yf 0 (see Fig. 5). We 
proceed in the same way for the other five pairs of consecutive sides: 
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i>iiorj< ji => Z2{Q) nF ^ 0, 
j < 32 ovk<k2^ Z3{Q) DF 
i < i\ or k < k\ Z^IQ) n F ^ 0, 

i<i 2 orj > j 2 Zs(Q) n F ^ 0, 

j > ji or k> ki Zq{Q) nF 0, 

where k = i + j — 1. The points Qi = (« 2 , ji) and Q 2 = (F, J 2 ) verify these six 
inequalities and so, by Property 1, Qi and Q 2 belong to F. Fig. 6 illustrates 





Fig. 6. The six configurations of hexagon A. 



the six allowed configurations of A and the points Qi and Q 2 belonging to each 
discrete set of F contained in A. We can divide these configurations into three 
groups: 

a. if t 2 < ii, ji < j 2 , then Qi = (i 2 ,ji) and Q 2 = (ii,j 2 ) (see Fig. 6a)); 

b. if j 2 < ji, k 2 < ki, then Qi = {ki - j 2 + l,j 2 ) and Q 2 = (k 2 - ji + 1, ji) 
(see Fig. 6b)); 

c. if ii < i 2 , ki < k 2 , then Qi = (^ 2 , ki~i 2 + 1) and Q 2 = (* 1,^2 - *1 + 1) (see 
Fig. 6c)). 

We refer to these configurations as case a, case b and case c. We notice that, if 
*1 = * 2 ) ji = j 2 and ki = /C 2 , we find one point (Qi = Q 2 ) of F, and if ii = i 2 
or ji = j 2 or ki = fe, we find more than two F points. 

Let us now determine a 6-path from Qi to Q 2 made up of F points. In case a, 
Qi = (fyjji) and the two points Pi = ({2 + 1, ji) and P 2 = (fyj ji + 1) adjacent 
to Qi are such that: 

■^fe(Fi) n F fy 0, for /c = 1, 2, 3, 4, 6, and Zfe(F 2 ) n F fy 0, for A: = 1, 3, 4, 5, 6, 
(see Fig. 6a)). From Property 1, we can deduce that: 
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if Z^{Pi) n F 7 ^ 0, then P\ G F, and if Z2{P2) n F yf 0, then P2 G F. 

We prove that Pi G F or P2 G F. Let us onsider the following cumulated sums 
of the row, column and diagonal X-rays: 

Ho = 0 , Hk = Y! 1 =i hi, k = l,...,m, 

Vo = 0 , Vfc = Y! 1 =i Vi, k = l,...,n, 

Ai-i = 0 , Pfc = Yl’i=i k = h, . . I2, 

and denote the common total sums of the row, column and diagonal X-rays by 
S. We have that: 

Lemma 1 . Let Q = {i,j) be a point of hexagon A containing a discrete set F 
of if. 



— If Hi > S — then Zi(Q) n F yf 0. 

- IfHi> Vj-i, then Z 2 {Q) n F yf 0. 

- If S — A+j -2 > then Zo{Q) C\F 

— If S — Pi+j -2 > Fi-i, then Z 4 (Q) n F yf 0 

- If Vj > Hi^i, then Z^{Q) n F yf 0. 

— If Vj > S — then Zg(Q) n F 0. 



Proof. We denote the first j columns and the first i — 1 rows of the set F by 
Cj and Pi-1, respectively. If Z 5 (Q) n F = 0, then Cj C Pi-i and so Vj < Pi-i 
(see Fig 7). Therefore, if Vj > Pi-i, then Z^{Q) n F y^ 0. We obtain the other 




Fig. 7. A point Q ^ F and ^ 5 (Q) n F = 0. 



statements in the same way. 



Theorem 1. There exists a 6-path from Qi to Q 2 made up of F points. 
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Proof. By lemma 1 and the previous discussion: 

ifi/ji > iJijjthenPi = (i2 + l,ji) E Pand ifPij > V^i,thenP2 = (*2,ji + l) E F 

Since Vj^ > or > Vj^ , we have Pi € P or P2 G P. We can repeat the same 
operation on the two points adjacent to the point Pk E F (with k = 1 or k = 2) 
determined in the previous step. We wish to point out that if Vj^ = then 
Pi E F and P2 E F. In this case, Zk{i 2 + 1, ji + 1) H P 7^ 0, for each 1 < A: < 6. 
So (^2 + 1, ji + 1) G P and we repeat the operation on its two adjacent points. 
We perform this procedure until it determines a point P G P which belongs to 
the row or the column containing Q2 = (*i, J2) (he., P = (ii, j) or P = (i, j2))- 
Every point Q between P and Q 2 = (P, J2) is such that Zk{Q) nP 7^ 0, for each 
1 < A: < 6 and so Q E F. By means of this procedure, we are able to find a 
6-path from Qi to Q 2 made up of P points. 

For example, if P = (1, 4, 5, 7, 9, 7, 5, 3, 2) and V = (1, 3, 3, 4, 6, 4, 6, 5, 4, 3, 3, 1), 
we obtain the 6-path from Qi to Q 2 shown in Fig. 8. We treat the other two cases 
in the same way. From Lemma 1, it follows that we have to use the cumulated 
sums Vj and S — Pi+j-i in case b, and Hi and S — Pi+j-i in case c. 




a = p = F eT . 



Let us now take the bases into consideration. Each side of hexagon A contains 
at least one base of P. Let Pi and P2 be a base of the sides I 2 J 1 and a base of 
I 1 J 2 , respectively. In case a. Pi and Qi, with i = 1 or i = 2, define a discrete 
rectangle Pi such that: if Q E Hi, then Zfc(Q) n P 7^ 0, for each 1 < A: < 6 and 
so Q E F (see Fig. 8). Notice that Hi can degenerate into a discrete segment 
when Bi and Qi belong to the same row or column. Therefore, we obtain a 
hex-connected set made up of P points and connecting two opposite sides of 
A. Unfortunately, we do not usually know the positions of the bases and so our 
algorithm chooses a pair of base-points (Pi, P2) belonging to two opposite sides 
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of A, and then attempts to construct a discrete set F of JT whose X-rays are 
equal to H, V and D. If the reconstruction attempt fails, the algorithm chooses 
a different position of the base-points [ 81 , 82 ) and repeats the reconstruction 
operations. More precisely, the algorithm determines Qi, Q 2 and the 6-path from 
Qi to Q 2 ; after that it chooses: 

- a point 81 e I 2 J 1 and a point 82 G I 1 J 2 in case a, or 

- a point 81 G K 1 J 2 and a point 82 G K 2 J 1 in case b, or 

- a point 8 \ G hKi and a point 82 G I 1 K 2 in case c, 

and then reconstructs the rectangle Ri defined by Qi and 8i, with i = 1,2. I 
then uses the same reconstruction procedure described in the algorithm defined 
in [3], that is, it performs the filling operations in the directions (1,0), (0,1) 
(1, 1) and, if necessary, links our problem to the 2-Satisfiability problem which 
can be solved in linear time [2] . 

We now describe the main steps of this reconstruction procedure. We call any 
set a such that a C F, a, kernel, and we call any set fi, such that F C f C A, 
a shell. Assuming that a is the reconstructed hex-connected set from 8 \ to 82 
and f is hexagon A, we perform the filling operations that expand a and reduce 
/?. These operations take advantage of both the convexity constraint and vectors 
H,V,D, and they are used iteratively until a (f (3 or a and f are invariant with 
respect to the filling operations. 

If we obtain a (f (3, there is no discrete set of T containing 8 \ and 82 and having 
X-rays FI, V, D. Therefore, the algorithm chooses another pair of base-points and 
performs the filling operations again. 

li a = f3, then a = F and so there is at least one solution to the problem 
(the algorithm reconstructs one of them). For example, by performing the filling 
operations on the hex-connected set from 81 to 82 in Fig 8, we obtain a = f3, 
and q; is a discrete set having X-rays Fl,V,D. 

Finally, if we obtain a C [3, then /? — a is a set of “indeterminate” points and 
we are not yet able to say that a set F having X-rays H, V, D exists. Therefore, 
we have to perform another operation to establish the existence of F. At first, 
a is a hex-connected set from 8 \ to 82 , where 8 \ to 82 belong to two opposite 
sides of A; therefore by performing the filling operations, we obtain: 

- a has at least one point in each diagonal of A in case a; 

- a has at least one point in each row of A in case b; 

- a has at least one point in each column of A in case c. 

Assume that we have case b: there is at least one point of a in each row of A, 
and so by the properties of the filling operations (see [3]), the length of the i-th 

row of (3 is smaller than 2hi for each 1 < f < m. If we are able to prove that: 

I) the length of the j-th column of (3 is equal to, or less than, 2vj for each 

1 < i < n; 

II) the length of the k-i\r diagonal of /? is equal to, or less than, 2dk for each 
l<A:<n + m— 1, 

there is a polynomial transformation of our reconstruction problem to the 2- 
Satisfiability problem. We first prove (I) and (II), and we then outline the main 
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Fig. 9. The kernel, the shell and the set of the indeterminate points. 



idea of the reduction to 2-Satisfiability problem defined in [3], By the proper- 
ties of filling operations, the indeterminate points follow each other into two 
sequences, one on the left side of a and the other on its right side. If {i,j) is 
an indeterminate point of the left sequence, then {i,j + hi) belongs to the right 
sequence (see Fig. 9) and these ponts are related to each other; let us assume 
that there is a discrete set F of JT having X-rays equal to H,V,D, then: 

- if (i,j) e F, then {i,j + hi) ^ F, 

- if (i,j) ^ F, then (i,j + hi) e F. 

As a result, the number of indeterminate points belonging to F is equal to the 
number of indeterminate points not belonging to F. This means that, in order to 
the conditions given by horizontal X-ray be satisfied, half of the indeterminate 
points have to be in F. If there is at least a j such that j-ih column of (3 is larger 
than 2vj, the number of its indeterminate points belonging to F has to be less 
than the number of its indeterminate points not belonging to F. Therefore, less 
than half of the indeterminate points are in F. We got a contradiction and so (3 
satisfies (I). By proceeding in the same way, we prove that (3 satisfies (II). 
Consequently, we can reduce our problem to a 2-Satisfiability problem. We prove 
the same result for the cases a and c in the same way. Therefore, the algorithm 
solves the problem for all iJ, T, D instances. 

We can summarize the reconstruction algorithm as follows. 

Input: Three vectors H e W™, F e N" and F e 

Output: a discrete set of F such that X(i = H, A(o^i)F = V, A(i = D 
or a message that there is no a set like this; 

1. check if F, V and D satisfy conditions (3.1)and (3.3); 

2. compute the points Qi and Q 2 ; 

3. compute the cumulated sums Hi, Vj, Dt, for z = 1, . . . , m, j = 1, . . . , n and 
/c = 1, . . . , n + m — 1; 

4. compute the 6-path from Q\ to Q 2 ] 

5. repeat 
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5.1. choose a pair of base-points {Bi, B 2 ) belonging to two 
opposite sides of A; 

5.2. compute the rectangles and R 2 ; 

5.3. a := RiU R 2 U the 6-path from Qi to Q 2 ; 

5.4. /?:= 

5.5. repeat 

5.5.1. perform the filling operations; 
until a f] or a, f] are invariants; 

5.7. if a = j3 then F = a is a solution; 

5.8. if q; C /? then reduce our problem to 2SAT; 

until there is a discrete set of J- having X-rays equal to i7, V, D or all the 
base-point pairs have been examined. 

We now examine the complexity of the algorithm described. Determining the 
hex-connected set from Bi to B 2 (i.e., Qi, Q 2 , the 6-path from Qi to Q 2 , and 
the rectangles R\ and R 2 ) involves a computational cost of 0{nm). In [13], the 
author proposes a simple procedure for performing the filling operations whose 
computational cost is 0{nm{n + m)). This procedure gives a kernel and a shell 
invariant with respect to the filling operations. If we obtain a C /3, the algorithm 
transforms our problem into a 2-Satisfiability problem and solves it in 0{nm) 
time. In case of failure, that is, when there is no discrete set of J- containing 
B\ and B 2 and having X-rays H,V,D, the algorithm chooses another pair of 
base-points and performs the filling operations again. At most, it has to check 
all the possible base-point pairs, that is 0((n + m)^) pairs. Consequently, the 
algorithm decides if there is a discrete set of JT having X-rays H,V,D] if so, the 
algorithm reconstructs one of them in 0{nm{n + m)^) time. 

Theorem 2. Consistency{{l, 0), (0, 1), (1, 1)) on T can be solved in 
0{nm{n + m)^) time. 

Remark 1. The algorithm can be easily extended to contexts having more than 
three X-rays and can reconstruct discrete sets convex in the directions of the X- 
rays. This means that Consistency((l, 0), (0, 1), (1, 1), U4 , . . . , Uk) on the class of 
connected sets, which are convex in all the directions (1,0), (0, 1), (1, 1), U4,. . ., Uk, 
is solvable in polynomial time. 
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Abstract. We consider a new method to retrieve keys in a static table. 
The keys of the table are stored in such a way that a binary search can 
be performed more efScently. An analysis of the method is performed 
and empirical evidence is given that it actually works. 



Keywords: binary searching, static dictionary problem. 

1 Introduction 

The present paper was motivated by some obvious observations concerning bi- 
nary searching when applied to static sets of strings; by following the directions 
indicated by these observations, we obtained some improvements that reduce 
execution time for binary searching of 30 — 50%, and also more for large tables. 

Let us consider the set of the zodiac names 

S = {capricorn, acquarius, pisces, aries, taurus, gemini, cancer, leo, 
Virgo, libra, Scorpio, Sagittarius} 

and consider the problem of finding out if identifier x belongs or does not belong 
to S. To do this, we can build a lexicographically ordered table T with the names 
in S and then use binary searching to establish whether x G S, or not (see Table 
1(a)). What is fine in binary searching is that, if the program is properly realized, 
the number of character comparisons is taken to a minimum. For instance, the 
case of aries is as follows: 1 comparison (a against 1) is used to compare aries 
and the median element leo; 1 comparison (a against c) is used to distinguish 
aries from cancer; 2 comparisons (ar against ac) are used for aries against 
acquarius. After that, the searched string is compared to the whole item aries. 

The problem here is that we do not know in advance how many characters 
have to be used in each string-to-string comparison; therefore, every string com- 
parison requires a (comparatively) high number of housekeeping instructions, 
which override the small number of character comparisons and make the match- 
ing procedure relatively time consuming. 

The aim of this paper is to show that the housekeeping instructions can be al- 
most eliminated by arranging the elements in a suitable way. First of all we have 
to determine the minimal number of characters able to distinguish the elements 
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Table 1. Two table’s structures for zodiac names 
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of the set: in the previous example the first three characters are sufficient to 
univocally distinguish strings in S and this is not casual. In fact, if we have a set 
S containing n strings on the alphabet S with |i7| = p, what we are expecting 
is that [log^n] characters actually distinguish the elements in S. The problem 
is that in general these characters can be in different positions and we have to 
determine, string by string, where they are. What we wish to show here is that 
for any given set S, it is relatively easy to find out whether the elements in S can 
be distinguished using a small amount of characters. In that case, we are then 
able to organize the set S' in a table T such that a modified binary searching can 
be performed with a minimum quantity of housekeeping instructions, thus im- 
proving the traditional searching procedure. Presently, we are also investigating 
how modified binary search compares to the method proposed by Bentley and 
Sedgewick [1]. To be more precise, in Section 2 we show that a pre-processing 
algorithm exists which determines (if any) the optimal organization of the ta- 
ble for the given set S; the time for this pre-processing phase is in the order 
of n(logn)^. Then, in Section 3, we find the probability that the pre-processing 
algorithm gives a positive result, i.e., it finds the optimal table organization 
for set S. Finally, in Section 4, an actual implementation of our method is dis- 
cussed and empirical results are presented showing the improvement achieved 
over traditional binary searching. 

2 The Pre-processing for Modified Binary Search 

Let us consider a set S' of n elements or keys, each u characters long (some 
characters may be blank). We wish to store these elements in a table T in which 
binary searching can be performed with a minimum quantity of housekeeping 
instructions. To this purpose we need a vector V[1 . . .n] containing, for every 
i = 1,2, ... ,n, the position from which the comparison, relative to the element 
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T[f], has to begin. A possible Pascal-like implementation of this Modified Binary 
Searching (MBS) method can be the following: 

function MBS{str:string): integer; 
var a,b,k,p: integer; found:hoolean; 
begin a := 1; 6 := n; /ound:= false; 
while a < b do 

k:= [{a + b)/2\; 
p:=Compare{str, T[k],V[k],s); 
if p = 0 

then found:=true; a := 6 + 1 
else if p < 0 

then b := k — 1 
else a := A: + 1 fi fi od; 

if found and T[k] = sir then MBS:=k else MBS:=0 fi 
end; 

Here, the function Compare(A,B,i,s) compares the two strings A and B ac- 
cording to s characters beginning from position i; the result is —1, 0 or 1 accord- 
ing to the fact that the s characters in A are lexicographically smaller, equal or 
greater than the corresponding characters in B. 

The procedure, we are going to describe to build table T and vector V, will 
be called PerfecWivision(S ,n,s) , where S and n are as specified above and s is 
the number of consecutive characters used to distinguish the various elements 
(in most cases 1 < s < 4 is sufficient). It returns a table T of dimension n, the 
optimal table, containing S’s elements and a vector V of dimension n such that, 
for all i from 1 to n, V[i] indicates the position from which the comparison has to 
begin. More precisely, the elements in table T are arranged in such a way that, 
if we binary search for the element x = Xi ... and compare it with element 
T[i] = T[i]i . . . T[i]u we have only to compare characters xv[{\ ■ ■ ■ xv[i]+s-\ with 
characters T[i]v\i\ ■ . 

The procedure consists of two main steps, the first of which tries to find out 
the element in S which determines the subdivision of the keys into two subtables. 
The second step recursively calls PerfectDivision to actually construct the two 
subtables. 

We begin by giving the definition of a selector: 

Definition 1 Let S be a set containing n u-length strings. The pair (i,o;[®l = 
0102 . . - Os), i & [1 . . .u — s P 1], is an s'^^^ -selector for S iff: 

1) 3!to e S : WiWi+i . . . Wi+s-\ = aifl2 • ■ ■ o^; 

2) 3[(n - 1)/2J keys y e S : yiPi+i ■ -.yi+s-i < ai 02 . . .o^; 

3) 3[(n - l)/2] keys y e S : ytyt+i ■ --yi+s-i > ai 02 . . .o^. 



Theorem 1 Let S be a set containing n u-length strings. Lf n = 1 or n = 2 
then S always admits a selector. 
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Proof: If S' = {x} then it admits the ll^l-selector (1, xi). If S = {x, y} , it admits 
a fW-selector (p, Zp) with p = min{i ^ [1 . . .u] : Xi ^ y{\ and Zp = min{a;p,yp} 
(such a p exists since x and y are distinct elements) . We observe that if S admits 
a f[pl -selector then it also admits an -selector Vp+1 — s<g<p. ■ 

More in general, for n > 2 we can give an integer function FindSelector(\av 
T, a, b, s) to determine whether or not the elements from position a to position 
6 in a table T containing 6 — a + 1 w-length strings, admit an s-selector; after a 
call to this function, if the selector has been found and k = [(a + b)/2\ is the 
index of the median element, then the elements in T are arranged in such a way 
that: 

1. the median element T[k] contains a I'*! starting at position i; 

2. the elements T[a], . . . ,T[k — 1] are less than T[k] when we compare the 
characters from position i to position f + s — 1 with 

3. the elements T[k 4- 1], , T[b] are greater than T[k] when we compare the 
characters from position i to position f + s — 1 with a I'*!. 

The function returns the position i = sel at which the selector begins, if any, 0 
otherwise. This value is stored in vector V from procedure PerfectDivision after 
FindSelector is called. 

FindSelector calls, in turn, a procedure. Select (var T, a, b, sel, s), and a 
boolean function, Equal(x, y, sel, s), which operate in the following way: 

— procedure Select arranges the elements in table T, from position a to position 
b, by comparing, for each element, the s characters beginning at position 
sel; the arrangement is such that the three conditions 1., 2. and 3. above are 
satisfied. As is well known, the selection problem to find the median element 
can be solved in linear time; however, in our algorithm we decided to use a 
heapsort procedure to completely sort the elements in T. This is slower, but 
safer^ than existing selection algorithms, and the ordering could be used at 
later stages; 

— function Equal compares the s characters of the strings x, y, beginning at 
position sel, and returns true if the compared characters are equal, false 
otherwise. It corresponds to Compare(x,y,sel,s)= 0 above. 

Function FindSelectorfvar T,a,b,s): integer; 
var sel,i: integer; 
begin 
sel := 1; 

if 6 — a > 2 then 

FindSelector:=0; 
i := [(a + &)/2j; 

^ The algorithm of Rivest and Tarjan [3, vol. 3, pag. 216], can only be applied to large 
sets; the algorithm of selection by tail recursion [2, pag. 230] is linear on the average 
but is quadratic in the worst case. 
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while sel < (m — s + 1) do 
Select (T, a, b, sel, s); 

if Equal{T[i], T[i + l],sel, s) or Equal{T[i],T[i — \],sel, s) then 
sel := sel + 1 

else 

FindSelector.=sel; 

sel ■.= u+1 {to force exit from the loop} 

fi 

od 

else if 6 — a = 1 then 

while sel < (m — s + 1) do 
Select (T, a, b,sel,s); 
if Equal(T[a],T[b], sel, s) then 
sel := sel + 1 

else 

FindSelector.=l; 

sel ■.= u+1 {to force exit from the loop} 

fi 

od 

FindSelector:=sel 

else 

FindSelector:=sel 

fi 

end {FindSelector}; 

Procedure PerfectDivision('var T,a,b,s); 

var k,i: integer; 

begin 

k ~ FindSelector (T, a, b, s); 

if A: = 0 then 

fail 

else 

i := [(a + 6)/2j; 

P[t] := k; 

if a < i then PerfectDivision(T, a, i-1, s) fi; 
if i <b then PerfectDivision(T, i+1, b, s) R 

fi 

end {PerfectDivision}; 

If we want to apply PerfectDivision to a set S' of n w- length strings, we first 
store S’s elements in a table T, hence call PerfectDivision(T,l,n,s) by choosing 
an appropriate value for s (in practice, we can start with s = 1 and then increase 
its value if the procedure fails). 

This procedure applied to the set of zodiac names by using s = 1 returns the 
table T and the vector V as in Table 1 (b) . Let us take into consideration again 
the case of aries: we start with the median element and since P[6] = 5 we have 




216 



Donatella Merlini, Renzo Sprugnoli, and M. Cecilia Verri 



to compare the fifth character of aries, s, with the fifth character of capricorn, 
i; s>i and we proceed with the new median element scorpio. Since V^[9] = 2 
we compare r with c; in the next step the median element is gemini and we 
compare the first character of aries with the first one of gemini. After that, the 
searched string is compared to the whole string aries and the procedure ends 
with success. 

For the set of the first twenty numbers names’ the procedure fails by using 
s = 1 and returns the situation depicted in Table 2 by choosing s = 2. 



Table 2. Table T and vector V when s = 2 for the first twenty numbers’ names 
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The complexity of PerfectDivision is given by the following Theorem: 

Theorem 2 If S is a set of words having length n and we set d = u — s + 1, 
then the eomplexity of the procedure PerfectDivision is at most \n2dn\o^n if 
we use Heapsort as a selection algorithm, and 0{dn\og2u) on the average if we 
use a linear selection algorithm. 

Proof: The most expensive part of PerfectDivision is the selection phase, while 
all the rest is performed in constant time. If we use HeapSort as an in-place 
sorting procedure, the time is Anlog 2 n, where A 2 In 2. This procedure is 
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executed at most d times. Let Cn be the total time for finding a complete division 
for our set of n elements; clearly we have: 

Cn ^ Adn log2 n + 2Cn/2- 

There are general methods, see [4], to find a valuation for Cn', however, let us 
suppose n = 2^, for some k, and set Ck = C' 2 fc; then we have Ck = Ad2^k + 2ck-i- 
By unfolding this recurrence, we find: 

Ck = Ad2^k + 2Ad2^-^{k - 1) + iAd2^-‘^{k -2) + ...= 

= Ad2^{k + (A: - 1) + • • • + 1) = Ad2^-^k{k + 1). 

Returning to Cn ■ 

Cn « Ad^ log 2 n(log 2 n + 1) = 0 (dn(log 2 nf ). 

Obviously, if we use a linear selection algorithm the time is reduced by a factor 
0(log2 n). Finally, we observe that d « log 2 n and this justifies our statement in 
the Introduction. ■ 



The next section is devoted to the problem of evaluating the probability that 
a perfect subdivision for the table T exists. 

3 The Analysis 

We now perform an analysis of the pre-processing phase of modified binary 
searching. Let S = {cri,cJ 2 , . . . ,Op} be a finite alphabet and let its elements be 
ordered in some way, for instance < U 2 <■■■< <J p . In real situations, S can 
be the Latin alphabet, p = 26 or p = 27 (if we also consider the space) and its 
ordering is the usual lexicographical order; so, on a computer, 27 is a subset of 
ASCII codes and if we have s = 3 then we should consider triples of letters. We 
can now abstract and define A = for the specific value of s, with the order 
induced by the order in U. Then we obscure s and set A = {oi, U 2 , . . . , Ur}, 
where r = p^. Finally, we observe that if S is any set of strings, S C A’“, for a 
given starting index the subwords of the words in S beginning at that position 
and composed by s consecutive characters are a multiset S over A. What we are 
looking for, is a suitable division of this multiset. 

Therefore, let us consider the multiset S, with [S'! = n, over our abstract 
alphabet A. Our problem is to find the probability that an element am G A exists 
such that: i) am G S, but has no duplicate in S; ii) there are exactly [(n+l)/2j — 1 
elements G 5 such that < am (i-e., these elements constitute a multiset over 
{oi, 02 , . . . , Um-i}); hi) there are exactly |~(n + l)/2] elements aj in S such that 
Oj > am (i-e., these elements constitute a multiset over {om+i, . . . , Or})- In a 
more general way we can define the following three [m,p)- separation conditions 
for a multiset S as above, relative to an element am G A (see also Definition 1): 
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i) ttm & S and has no duplicate in S; 

ii) there are {p — 1) elements in S preceding Um', 

iii) there are (n — p) elements in S following Um- 

Our first important result is the following: 

Theorem 3 Let Tr{n,p,r,m) be the probability that a multiset S on A (S and 
A as above) contains a specific element a-m for which the {m,p)~ separation con- 
ditions are satisfied; then 

Tr{n,p, r,m) = ^ (m — 

Proof: Let us count the total number P{n,p,r,m) of the multisets satisfying 
the three (m,p)-separation conditions for a given element Om & A {1 < m < r). 
If we imagine that the elements in S are sorted according to their order sequence 
in A, the first p — I elements must belong to the subset {oi, U 2 , . . . , Um-i}, and 
therefore (m — such submultisets exist. In the same way, the last n — p 

elements in S must belong to {am+i, ■ ■ ■ ,dr} and therefore (r — such 

submultisets exist. Since every first part can be combined with each second 
part, and the pth character must equal Om by hypothesis, there exists a total 
of (m — ordered multisets of the type described. The original 

multisets, however, are not sorted, and the elements may assume any position 
in S. The different combinations are counted by a simple trinomial coefficient, 
which reduces to a binomial coefficient: 

f n \ n\ n\ fn\ 

\p—l,l,n — p) {p — l)\l\{n — p)\ ^p\{n—p)\ ^\p) 

We conclude that the total number of multisets we are looking for is: 

P{n,p, r, m) = 

Finally, the corresponding probability is found by dividing this expression by r"", 
the total number of multisets with n elements. ■ 



An immediate corollary of this result gives us the probability that, given S, 
an element am exists that satisfies the (m, p)-separation conditions: 

Theorem 4 Given a multiset S as above, the probability Tr{n,p,r) that an ele- 
ment am & A exists for which the (m,p) -separation conditions hold true is: 

7r(n, p,r) = ^ ^ - 1)^^^ {r - 

^ \P/ 
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Proof: Obviously, any Um G A could satisfy the separation conditions, each 
one in an independent way of any other (a single multiset can contain several 
elements satisfying the conditions). Therefore, the probability is the sum of the 
single probabilities. ■ 



The formula of this theorem is not very appealing and does not give an 
intuitive idea of how the probability 7r(n,p,r) varies with p; in particular, what 
it is when p = [(n + 1)/2J, the case we are interested in. In order to arrive 
at a more intelligible formula, we are going to approximate it by obtaining its 
asymptotic value. To do that, we must suppose n < r, and this hypothesis 
will be understood in the sequel. In real situations, where r = \A\ = 26'* or 
r = \A\ = 27^, we simply have to choose a selector of s character with: 

— s = 1 for tables up to n = 22 elements; 

— s = 2 for tables up to n = 570 elements; 

— s = 3 for tables up to n = 15, 000 elements; 

— s = 4 for tables up to n = 400, 000 elements. 

These numbers correspond to n/r « 0.85 if | 2 l| = 26® and to n/r « 0.8 if 
|M| = 27®. Obviously, a selector with s > 4 can result in a worsening with 
respect to traditional binary searching. On the other hand, very large tables 
should reside in secondary storage. 

A curious fact is that the probabilities Tv{n,p,r) are almost independent of 
p; as we are now going to show, the dependence on p can only be appreciated 
when p is very near to 1 or very near to n. 



Theorem 5 The probability 7r(n,p,r) of the previous theorem has the following 
asymptotic approximations (n < r) .• 



7r(n,p,r) = ( 1 - (l + O ^ ^ 



when 2 < p < n — 2; (3.1) 



7r(n, 2, r) = 7r(n, n — l,r)=(l ) (l — 



n(n — 1) 
12(r- 1)2 



+ 0 K ; (3.2) 



1 - - 1 - 



7r(n, 1, r) = 7r(n, n, r) = 
n n(n — 1) 



2(r-l) 12(r-l)2 



0\%r 



(3.3) 



Proof: Let us apply the Euler- McLaur in summation formula to approximate 
the sum ~ ~ m)"'^^. By writing m as the continuous variable 

X, we have: 



Y,{rn - l)P-\r - m^-P 

m=l 



r — 1 

- l)P^\r - m^-P = 

m=l 
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= - ir-\r - xr-pdx + B, [f{x)]\ + ^ [r{x)][ + ■■■ 

Here, f{x) = (x — — and we immediately observe that for p > 1 we 
have lf{x)]^ = 0. In the same way, the first (p— 1) derivatives are 0 and, in order 
to find suitable approximations for 7r(n,p,r), we are reduced to the three cases 
p = l,p = 2 and p > 2. By symmetry, we also have Tr{n,p, r) = 7r(n, n — p+ 1, r). 

Since f{x) is a simple polynomial in x, the integral can be evaluated as 
follows: 

yP-\r - 1 - yr-Pdy = 

(r - dy = 

L 

^,n—k—l 




“(■■-irE 




The last sum in this derivation is well-known and represents the inverse of a 
binomial coefficient (see Knuth [3, vol. 1, pag. 71]). Therefore we obtain formula 
(3.1) for 2 < p < n — 2. For p = 2 we have a contribution from the first derivative: 

[r{x)][ = [(r - xr-^ - (a: - l)(n - 2)(r - x)^-% = -(r - l)--^ 

and therefore we have formula (3.2). Finally, for p = 1 we also have a contribu- 
tion from [f{x)]l, and find formula (3.3). ■ 



Table 3 illustrates the probabilities 7r(n,p,r) for n = 12 and r = 20. We 
used a computer program for simulating the problem, and in the table we show 
the probabilities found by simulation, the exact probabilities given by Theorem 
4, and the approximate probabilities as computed by formulas (3.1), (3.2) and 
(3.3). 

Formula (3.1), in the short version 7r(n,p, r) = (1 — 1/r)", allows us to obtain 
an estimate of the probability of finding a division with p = [(n+l)/2j for a given 
set S of identifiers or words over some language. In general, several positions d 
of the identifiers are available for a division (function FindSelector finds the first 
one), and this increases the probability of success. Usually, if s is the length of 
the selector and u the identifiers’ length, we have d = u — s + 1, and in general: 
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Table 3. Simulation, exact and approximate probabilities 
SIMULATION FOR MODIFIED BINARY SEARCHING 



Number of alphabet elements (r): 20 
Number of multiset elements (n): 12 
Number of simulation trials: 30000 



p 


simulat . 


exact 


approx. 


1 


0.72176667 


0.72739722 


0.72746538 


2 


0.51906667 


0.52409881 


0.52389482 


3 


0.54113333 


0.54015736 


0.54036009 


4 


0.53856667 


0.54042599 


0.54036009 


5 


0.53563333 


0.54036133 


0.54036009 


6 


0.54176667 


0.54035984 


0.54036009 


7 


0.54300000 


0.54035984 


0.54036009 


8 


0.54173333 


0.54036133 


0.54036009 


9 


0.53923333 


0.54042599 


0.54036009 


10 


0.53963333 


0.54015736 


0.54036009 


11 


0.52596667 


0.52409881 


0.52389482 


12 


0.72970000 


0.72739722 


0.72746538 



Theorem 6 Let S be a multiset of identifiers over A, with |5| = n and |A| = r. 
If d positions in the elements of S are available for division, then the probability 
that an alement a^n & S exists satisfying the (m,p) -separation conditions with 
p= [(n + 1)/2J is: 

cr(n,r,d) = 1 - (1 - 7r(n,p,r))“* « 1 - ^1 - ^1 - ^ • (3.4) 

This quantity can be approximated by: 

cr(n, r, d) « 1 - exp 

Proof: We can consider the d positions as independent of each other, and 

therefore equation (3.4) follows immediately. To prove (3.5) we need some com- 
putations. First of all we have: 

(l-i) = 

= exp - iL - 1 . 

V r 2r^ 3r'^ / 

By expanding the exponentials, we find: 

\ — Q-njr ^-n!2A _ 
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n 

r ^ 2r^ 






n 

r 




— 3n + 2 
6r^ 



+ 0 




This formula can be written as: 



1 - 




(^l + O 




and therefore: 




d{n — 1) 
2r 



which immediately gives equation (3.5). 



Having found a division for S, we are not yet finished, because the procedure has 
to be recursively applied to the subsets obtained from S (see procedure Perfect- 
Division). Since n/2 is the approximate size of these two subsets, the probability 
of finding a division for each of them is: 



a 




1 - 




d{n — 2) 
4r 



in fact, except that in very particular cases, the number of possible positions, 
at which division can take place, is not diminished. The joint probability that 
both subsets can be divided is cr(n/2,r, and the probability of obtaining a 
complete division of S, so that modified binary searching is applicable, is: 



r(n, r, d) = a{n, r, d)a{n/2, r, d)^a(n/4, r, d)^ • • • , 

the product extended up to a{n/2^,r,d)‘^'° such that n/2^ > 3. In fact, as ob- 
served in Theorem 1, a division for tables with 1 or 2 elements is always possible 
(i.e., a = 1). 

The probability r{n, r, d) is the quantity we are mainly interested in. Obvi- 
ously, r(n, r, d) < a{n, r, d) but, fortunately, we can show that these probabilities 
are almost equal, at least in most important cases. 



Theorem 7 If a{n,r, d) is sufficiently near to 1, then r{n,r,d) « a{n,r,d). 
Proof: First of all, let us develop r(n, r, d) : 



r(n, r, d) 



n 



exp 



d{n — 1) 



1-(^)C.PI- 



2r 
d{n — 1) 



1-2 



exp 



2r 






d(n — 2) 
4r 

d(n — 2) 
4r 
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What we now wish to show is that the first term (after 1) dominates all the 
others, and therefore r{n,r,d) « a{n,r,d). Let us consider the ratio between 
the first and the second term after the 1 : 




d{n “ 1) A 1 

2r / 2 \Ji ) 



d 

1 exp 



d{n — 2) 
4r 



= 2“*-^exp 



dn — 2d — 2dn + 2d 
4r 



'id—1 



I 



This quantity shows that the first term is much larger than the second, as 
claimed. ■ 



4 Experimental Results 

In order to verify the effectiveness of our modified binary searching method, 
we devised some experiments comparing traditional and modified binary search 
programs. We used Pascal as a programming language, well aware that this 
might not be the best choice. However, our aim was also to show how better 
is our method when realised without any particular trick in a high level lan- 
guage. When the table is large, we need two or more characters for the selector 
s; our implementation obviously performs one-character-at-a-time comparisons; 
since the characters to be compared against s are consecutive, a machine lan- 
guage realization would instead perform a multi-byte loading and comparing, 
further reducing execution time. Presently, we are developing a C version of our 
programs to compare them to the most sophisticated realisations of traditional 
binary searching and to other, more recent approaches to string retrieval, such 
as the method of Bentley and Sedgewick [1] . 

We used the following program for traditional binary searching: 

function BS (str : string) : integer; 
var a, b, k : integer; found : boolean; 
begin 

a := 1; b := n; found := false; 
while a<=b do begin 
k:=(a+b) div 2; 
if str = T[k] 

then begin a := b+1; found := true end 
else if str < T[k] then b:=k-l else a:=k+l 
end; 

if found then BS := k else BS := 0 
end; 
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For the modified binary searching, the program is: 

function MBSl (str : string): integer; 
var a, b, j, k : integer; found : boolean; 
begin 

a := 1; b := n; found := false; 
while a <= b do begin 

k := (a+b) div 2; j := V[k]; 
if str[j] = T[k,j] 

then begin a := b+1; found := true end 
else if str[j] < T[k,j] 
then b := k - 1 
else a := k + 1 end; 
if found and (str = T[k]) 

then MBSl := k else MBSl := 0 

end; 

This program is to be used when the selector length is 1; for a selector length 
of 2 we simply perform two cascade ifs, and three for a selector length of 3. 

We performed 10 blocks of 20,000 searches of all the strings in a table, ran- 
domly chosen in a dictionary of 1,524 English words. For small tables (selector 
length equal to 1) we obtained the average times in the first part of Table 4 (the 
time unit is inessential; only relative times are of importance). For larger tables 
with selector length equal to 2 or 3 we obtained the times in parts two and three 
of Table 4. 



Table 4. Times for selector of length 1, 2 and 3 



n 


MBS 


BS 


gain (%) 




n 


MBS 


BS 


gain (%) 




n 


MBS 


BS 


gain (%) 


10 


29.7 


56.2 


47.2 




10 


33.0 


54.8 


39.8 




50 


100.0 


228.4 


56.2 


11 


33.9 


63.6 


46.7 




20 


69.9 


141.1 


50.5 




100 


222.8 


541.8 


58.9 


12 


36.3 


70.9 


48.8 




30 


108.3 


234.4 


53.8 




150 


346.0 


887.6 


61.0 


13 


38.9 


80.4 


51.6 




40 


150.5 


345.4 


56.4 




200 


472.4 


1256.6 


62.4 


14 


44.5 


86.3 


48.4 




60 


235.1 


568.5 


58.6 




250 


607.6 


1625.4 


62.6 


15 


47.1 


93.8 


49.8 




80 


326.8 


825.0 


60.4 




300 


751.2 


2035.4 


63.1 


16 


52.0 


103.1 


49.6 




100 


421.8 


1084.2 


61.1 




350 


883.4 


2452.8 


64.0 


17 


54.4 


113.7 


52.2 




120 


519.6 


1340.6 


61.2 




400 


1032.8 


2864.8 


63.9 


18 


58.3 


123.1 


52.6 




140 


619.5 


1625.3 


61.9 




450 


1151.4 


3283.4 


64.9 


20 


65.9 


139.6 


52.8 




160 


713.4 


1919.7 


62.8 




500 


1310.4 


3702.0 


64.6 


22 


73.3 


159.5 


54.0 




180 


806.5 


2214.3 


63.6 




550 


1447.8 


4147.0 


65.1 


24 


81.3 


178.8 


54.5 




200 


913.3 


2510.2 


63.6 




600 


1592.8 


4610.2 


65.5 


26 


87.3 


197.8 


55.9 




225 


1032.6 


2877.0 


64.1 




700 


1874.0 


5534.2 


66.1 


28 


94.4 


215.9 


56.3 




250 


1146.3 


3248.7 


64.7 




800 


2202.6 


6461.4 


65.9 


30 


101.6 


232.8 


56.4 




275 


1263.1 


3648.8 


65.4 




900 


2491.4 


7377.8 


66.2 


32 


110.0 


254.8 


56.8 




300 


1408.8 


4060.6 


65.3 




1000 


2760.6 


8303.8 


66.8 
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5 Conclusions 

We have considered a variant of binary searching, which avoids most of the 
housekeeping instructions related to string comparisons. This requires a suitable 
pre-processing phase for the table to be searched, and therefore only applies to 
the static case. What we have shown is: 

1. The variant is considerably faster than traditional binary searching; we give 
empirical evidence of this fact, by comparing actual programs performing 
both kinds of binary searching. 

2. The pre-processing phase is fast, because it runs in time 0(n(log2 n)^), and 
produces with a very high probability the optimal arrangement of the table 
elements. In any case, an almost optimal arrangement can always be found. 

In our opinion, an important aspect of our method is that it can be efficiently 
realised in a high level language; as is well-known, this is not always possible 
for other kinds of fast retrieval methods for static tables, such as perfect hash- 
ing. This makes our modified binary searching procedure attractive for actual 
implemenattion in real systems. 
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Abstract. We present an efiicient algorithm for the approximate median selec- 
tion problem. The algorithm works in-place\ it is fast and easy to implement. 
For a large array it remms, with high probability, a very close estimate of the true 
median. The running time is linear in the length n of the input. The algorimm per- 
forms fewer than |n comparisons and |n exchanges on the average. We present 
analytical results of the performance of the algorithm, as well as experimental 
illustrations of its precision. 

Keywords: Approximation algorithms, in-place algorithms, median selection, 
analysis of algorithms. 



1. Introduction 

In this paper we present an efficient algorithm for the in-place approximate median 
selection problem. There are several works in the literature treating the exact median 
selection problem (cf. [BFP*73], [DZ99], [FJ80], [FR75], [Hoa61], [HPM97]). Various 
in-place median finding algorithms have heen proposed. Traditionally, the “comparison 
cost model” is adopted, where the only factor considered in the algorithm cost is the 
number of key-comparisons. The best upper bound on this cost found so far is nearly 
3n comparisons in the worst case (cf. [DZ99]). However, this hound and the nearly-as- 
efficient ones share the unfortunate feature that their nice asymptotic behaviour is “paid 
for” by extremely involved implementations. 

The algorithm described here approximates the median with high precision and 
lends itself to an immediate implementation. Moreover, it is quite fast: we show that 
it needs fewer than comparisons and exchanges on the average and fewer than 
|n comparisons and exchanges in the worst-case. In addition to its sequential effi- 
ciency, it is very easily parallelizahle due to the low level of data contention it creates. 

The usefulness of such an algorithm is evident for all applications where it is suffi- 
cient to find an approximate median, for example in some heapsort variants (cf. [Ros97], 
[Kat96]), or for median-filtering in image representation. In addition, the analysis of its 
precision is of independent interest. 



G. Bongiovanni, G. Gambosi, R. Petreschi (Eds.): CIAC2000, LNCS 1767, pp. 226-238, 2000. 
(g) Springer- Verlag Berlin Heidelberg 2000 
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We note that the procedure pseudomed in [BB96, §7.5] is similar to performing just 
one iteration of the algorithm we present (using quintets instead of triplets), as an aid in 
deriving a (precise) selection procedure. 

In a companion paper we show how to extend our method to approximate general 
/c-selection. 

All the works mentioned above — as well as ours — assume the selection is from 
values stored in an array in main memory. The algorithm has an additional property 
which, as we found recently, has led to its being discovered before, albeit for solving a 
rather different problem. As is apparent on reading the algorithms presented in Section 
2, it is possible to perform the selection in this way “on the fly,” without keeping all 
the values in storage. At the extreme case, if the values are read in one-by-one, the 
algorithm only uses « 4 logg n positions (including [logg nj loop variables). This way 
of performing the algorithm is described in [RB90], in the context of estimating the 
median of an unknown distribution. The authors show there that the value thus selected 
is a consistent estimator of the desired parameter. They need pay no attention (and 
indeed do not) to the relation between the value the algorithm selects and the actual 
sample median. The last relation is the center point of interest for us. Curiously, Weide 
notes in [Wei78] that this approach provides an approximation of the sample median, 
though no analysis of the bias is provided. See [HM95] for further discussion of the 
method of Rousseeuw and Bassett, and numerous other treatments of the statistical 
problem of low-storage quantile (and in particular median) estimation. 

In Section 2 we present the algorithm. Section 3 provides analysis of its run-time. In 
Section 4, to show the soundness of the method, we present a probabilistic analysis of 
the precision of its median selection. Since it is hard to glean the shape of the distribu- 
tion function from the analytical results, we provide computational evidence to support 
the conjecture that the distribution is asymptotically normal. In Section 5 we illustrate 
the algorithm with a few experimental results, which also demonstrate its robustness. 
Section 6 concludes the paper with suggested directions for additional research. 

An extended version of this paper is available by anonymous ftp from 
ftp://ftp.cs.wpi. edu/pub/ techreports/ 99-26.ps.gz. 

2. The Algorithm 

It is convenient to distinguish two cases: 

2.1 The Size of the Input Is a Power of 3: n = S’* 

Let n = S’" be the size of the input array, with an integer r. The algorithm proceeds in r 
stages. At each stage it divides the input into subsets of three elements, and calculates 
the median of each such triplet. The ’’local medians” survive to the next stage. The algo- 
rithm continues recursively, using the local results to compute the approximate median 
of the initial set. To incur the fewest number of exchanges we do not move the chosen 
elements from their original triplets. This adds some index manipulation operations, but 
is typically advantageous. (While the order of the elements is disturbed, the contents of 
the array is unchanged). 
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Approximate Median Algorithm (1) 

Triplet_Adjust(A, i, Step) 

Let j= i+Step and k= i+2 ■ Step; this procedure moves the median of a triplet of 
terms at locations i, j, k to the middle position. 

if (AH < A\j\) 
then 

if (A[k\ < A[i]) then Swap(A[i], A[j]); 
else if (A[k\ < A[j]) then Swap(A[j], A[k]); 

else 

if (A[i] < A[k]) then Swap(A[i], A[j]); 
else if (A[fc] > A[j\) then Swap(A[j], A[fc]); 

Approximate_Median(A, r) 

This procedure returns the approximate median of the array A[0, S’" — 1]. 

Step=\; Size=y\ 

repeat r times 

i={Step—\)/2; 
while i < Size do 

Triplet_Adjust(A, i, Step); 
i=i+{3-Step); 
end while; 

Step = 2 -Step; 
end repeat; 

return A[{Size — 1) /2] ; 



Fig.l. Pseudo-code for the approximate median algorithm, n = S’", r £ N. 



In Fig. 1 we show pseudo-code for the algorithm. The procedure Triplet -Adjust finds 
the median of triplets with elements that are indexed hy two parameters: one, i, denotes 
the position of the leftmost element of triplet in the array. The second parameter, Step, is 
the relative distance between the triplet elements. This approach requires that when the 
procedure returns, the median of the triplet is in the middle position, possibly following 
an exchange. The Approximate JAedian algorithm simply consists of successive calls to 
the procedure. 



2.2 The Extension of the Algorithm to Arhitrary-Size Input 

The method described in the previous subsection can be generalized to array sizes which 
are not powers of 3. The basic idea is similar. Let n be the input size at the current stage, 
where 

n = 3 • t -f /c, fcG {0,1,2}. 

We divide the input into (t — 1) triplets and a (3 -f /c) -tuple. The (t — 1) triplets are pro- 
cessed by the same Triplet-Adjust procedure described above. The last tuple is sorted 
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(using an adaptation of selection-sort) and the median is extracted. The algorithm con- 
tinues iteratively using the results of each stage as input for a new one. This is done 
until the number of local medians falls below a small fixed threshold. We then sort the 
remaining elements and obtain the median. To symmetrize the algorithm, the array is 
scanned from left to right during the first iteration, then from right to left on the second 
one, and so on, changing the scanning sense at each iteration. This should reduce the 
perturbation due to the different way in which the medians from the (3 + /c)-tuples are 
selected and improve the precision of the algorithm. Note that we chose to select the 
second element out of four as the median (2 out of 1 ..4). We show pseudo-code for the 
general case algorithm in Fig. 2. 

Approximate Median Algorithm (2) 

SelectionJSort (A, Left, Size, Step) 

This procedure sorts Size elements of the array A located at positions Left, Left + Step, 
Left + 2 • Step , . . . , Left + [Size — 1) • Step. 

for (i = Left ; i < Left + [Size — 1) • Step', i = i -f Step) 

Min = i; 

for {j = i + Step', j < Left + Size ■ Step', j = j + Step) 
if (A[j] < A[min\) then min = j; 

end for; 

Swap(A[i], A[mm]); 

end for; 

Approximate_Median_AnyN (A, Size) 

This procedure returns the approximate median of the array A [0, ..., Size — 1]. 
LeftToRight = False', Left = Q', Step = T, 
while {Size > Threshold) do 

LeftToRight = Not (LeftToRight)', 

Rem = (Size mod 3); 
if (LeftToRight) then i = Left, 

else i = Left + (3 -f Rem) ■ Step-, 
repeat (Size/3 — 1) times 

Triplet_Adjust (A, i. Step)', 
i = i + 3 ■ Step', 
end repeat; 

if (LeftToRight) then Left = Left + Step', 
else i = Left, 

Left = Left + (1 + Rem) ■ Step', 

Seleetion_Sort (A, i,3 + Rem, Step)', 
if (Rem = 2) then 

if (LeftToRight) then Swap(A[r + Step], A[i + 2 • Step]) 

else Swap(A[r + 2 • Step], A[i + 3 • Step])', 

Step = 3 • Step', Size = Size/3', 
end while; 

Seleetion_Sort (A, Left, Size, Step)', 
return A[Left + Step ■ ]_(Size — 1)/2J]; 

Fig. 2. Pseudo-code for the approximate median algorithm, any n G N. 
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Note: The reason we use a terminating tuple of size 4 or 5, rather than 1 or 2, is to keep 
the equal spacing of elements surviving one stage to the next. 

The procedure SelectionSort takes as input four parameters: the array A, its size 
and two integers, Left and Step. At each iteration Left points to the leftmost element 
of the array which is in the current input, and Step is the distance between any two 
successive elements in this input. 

There are several alternatives to this approach for arbitrary-sized input. An attractive 
one is described in [RB90], but it requires additional storage of approximately 4 log 3 n 
memory locations. 



3. Run-Time Analysis: Counting Moves and Comparisons 



Most of the work of the algorithm is spent in Triplet_Adjust, comparing values and ex- 
changing elements within triplets to locate their medians. We compute now the number 
of comparisons and exchanges performed by the algorithm Approximate-Median. 

Like all reasonable median-searching algorithms, ours has running-time which is 
linear in the array size. It is distinguished by the simplicity of its code, and hence it is 
extremely efficient. We consider first the algorithm described in Fig. 1. 

Let n = S’", r G N, be the size of a randomly-ordered input array. We have the 
following elementary results: 



Theorem 1. Given an input of size n, the algorithm Approximate -Median performs 
fewer than |n comparisons and exchanges on the average. □ 



Proof: Consider first the Triplet-Adjust subroutine. In the following table we show the 
number of comparisons and exchanges, C 3 and S 3 , for each permutation of three distinct 
elements: 



A[i] 

1 

1 

2 

2 

3 

3 



A [i + Step] 
2 
3 
1 
3 
1 
2 



A [i+2*Step] 
3 
2 
3 
1 
2 
1 



Comparisons 

3 

3 

2 

2 

3 

3 



Exchanges 

0 

1 

1 

1 

1 

0 



Clearly, assuming all orders equally likely, we find Pr(C 3 = 2) = 1 — Pr(C 3 = 
3) = 1/3, and similarly Pr (£^3 = 0) = 1 — Pr (^3 = 1) = 1/3, with expected values 
E[S 3 ] = 2/3 and E[C 3 ] = 8/3. 

To find the work of the entire algorithm with an input of size n, we multiply the 
above by T(n), the number of times the subroutine Triplet-Adjust is executed. This 
number is deterministic. We have T(l) = 0 and T(n) = f + T(|), for n > 1; for n 
which is a power of 3 the solution is immediate: T{n) = | (n — 1) . 

Let Tin be the number of possible inputs of size n and let En be the total number 
of comparisons performed by the algorithm on all inputs of size n. 
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The average number of comparisons for all inputs of size n is: 



4 

lln n\ o 



To get Sn we count all the triplets considered for all the inputs, i.e. n! • T(n); for 
each triplet we consider the cost over its 3! permutations (the factor 16 is the cost for 
the 3! permutations of each triplet*). 

The average number of exchanges can be shown analogously, since two out of three 
permutations require an exchange. 

By picking the “worst” rows in the table given in the proof of Theorem 1, it is straight- 
forward to verily also the following: 

Theorem 2. Given an input of size n = 3’", the algorithm Approximate-Median per- 
forms fewer than |n eomparisons and in exchanges in the worst-case. □ 

For an input size which is a power of 3, the algorithm of Fig. 2 performs nearly 
the same operations as the simpler algorithm - in particular, it makes the same key- 
comparisons, and selects the same elements. For log 3 n ^ N, their performance only 
differs on one tuple per iteration, hence the leading term (and its coefficient) in the 
asymptotic expression for the costs is the same as in the simpler case. 

The non-local algorithm described in [RB90] performs exactly the same number of 
comparisons as above but always moves the selected median. The overall run-time cost 
is very similar to our procedure. 



4. Probabilistic Performance Analysis 

4.1 Range of Selection 

It is obvious that not all the input array elements can be selected by the algorithm — 
eg., the smallest one is discarded in the first stage. Let us consider first the algorithm of 
Fig. 1 {i.e. when n is a power of 3). Let v{n) be the number of elements from the lower 
end (alternatively - upper end, since the Approximate -Median algorithm has bilateral 
symmetry) of the input which cannot be selected out of an array of n elements. It is easy 
to verify (by observing the tree built with the algorithm) that v{n) obeys the following 
recursive inequality: 



u(3) = l , u(n) > 2u(n/3) + 1. (1) 

Moreover, when n = 3’", the equality holds. The solution of the recursive equation, 
{u(3) = 1; v{n) = 2u(n/3) + 1} 



* Alternatively, we can use the following recurrence: C(l) =0andC(n) = j • c + C(j), for 
n > 1, where c = ^ is the average number of comparisons of Triplet-Adjust (because all the 
3! permutations are equally likely). 
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is the following function 



v{n) = 2 _ 1 = _ i. 

Let X be the output of the algorithm over an input of n elements. From the definition of 
u(n) it follows that 

v{n) < rank{x) < n — v{n) + 1. (2) 

The seeond algorithm behaves very similarly (they perform the same operations 
when n = S'") and the range function v{n) obeys the same recurrence. 

Unfortunately not many entries get thus excluded. The range of possible selection, 
as a fraction of the entire set of numbers, inereases promptly with n. This is simplest to 
illustrate with n that is a power of 3. Since v{n) can be written as — 1, the ratio 

v{n)/n is approximately (2/3)^°®3". Thus, for n = 3^ = 27, where the smallest (and 
largest) 7 numbers cannot be selected, 52% of the range is exeluded; the comparable 
restrietion is 17.3% for n = 3® = 729 and only 1.73% for n = 3^^ = 531441. 

The true state of affairs, as we now proceed to show, is much better: while the pos- 
sible range of choice is wide, the algorithm zeroes in, with overwhelming probability, 
on a very small neighborhood of the true median. 



4.2 Probabilities of Selection 



The most telling charaeteristic of the algorithms is their preeision, which can be ex- 
pressed via the probability function 



P{z) = Vx[zn < rank{x) < (1 — z)n + 1], 



( 3 ) 



for 0 < 2 : < 1/2, which describes the closeness of the selected value to the true median. 

The purpose of the following analysis is to show the behavior of this distribution. 
We consider n which is a power of 3. 

Definition 1. Let g/ be the number of permutations, out of the n! = 3’" ! possible ones, 
in which the entry which is the smallest in the set is: (1) selected, and (2) becomes 
the 6*^ smallest in the next set, which has ^ = 3’"“^ entries. 

It turns out that this quite narrow look at the selection process is all we need to 
charaeterize it completely. 

It can be shown that 



U) 

%,b 



2n(a — l)!(n — a)! 



- - 1 
3 ^ 

6-1 



- 6-1 






6-1 



E _ 6 
3 ^ 



a — 2b — i I 9 



1 



( 4 ) 



(for details, see [BCC*99]). 

It can also be seen that g/ ^ is nonzero for 0<a — 26<|^ — 1 only. The sum is 

expressible as a Jacobi polynomial, (|)'* P^ 2 b (f)’ where u = ib — a — \,v = 
^ + b — a, and a simpler closed form is unlikely. 
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Let be the probability that item a gets to be the 6*^ smallest among those se- 
leeted for the next stage. Since the n! = S’"! permutations are assumed to be equally 

(t) (r) 

likely, we have Pal ~ 

= 3^:^ X E i )\a-2b-.)¥ 

= 5^^ >: ["“-“Ki + lA'd + *)*-‘- (5) 

Va— 1/ 



(r) 

This allows us to calculate the center of our interest: The probability Pa , of starting 
with an array of the first n = S’" natural numbers, and having the element a ultimately 
chosen as approximate median. It is given by 



p{r) _ V P 



P) p(i--l) _ 






{r) {r—1) (2) 

Pa,bA.K-P"Pb:,,2 . 



( 6 ) 



where 2-’ ^ <bj <V ^ — 2-’ ^ + 1, for j = 3, 4, . . . , r. 

(r) 

Some telescopic cancellation occurs when the explicit expression for ^ is used 
here, and we get 



p(r) — 



Ta— 1 



(rl) 



E 



br,K 



n E 

j=2 ij>0 



bj - 1 



3^-1 - bb 



h+i 



— 2bi — i. 



1 

9h 



(7) 



As above, each bj takes values in the range [2-’ ^...3-’ ^ — 2^ ^ + 1], 62 = 2 and 
br+i = a (we could let all bj take all positive values, and the binomial coefficients 

(r) 

would produce nonzero values for the required range only). The probability Pa is 
nonzero for v{n) < a < n — v{n) + 1 only. 

This distribution has so far resisted our attempts to provide an analytic characteriza- 
tion of its behavior. In particular, while the examples below suggest very strongly that 
as the input array grows, it approaches the normal distribution, this is not easy to show 
analytically. (See Section 4.4 of [BCC*99] for an approach to gain further information 
about the large-sample distribution.) 



4.3 Examples 

(r) 

We computed Pa for several values of r. Results for a small array (r = 3, n = 27) 
are shown in Fig. 3. By comparison, with a larger array (r = 5, n = 243, Fig. 4) we 
notice the relative concentration of the likely range of selection around the true median. 
In terms of these probabilities the relation (3) is: 

E (8) 

lznj<a< [(1 — 2 :)n]+l 



where 0 < 2 < |. 
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Fig. 3. Plot of the median probability distribution for n=27. 



We chose to present the effectiveness of the algorithm by computing directly from 
equation (7) the statistics of the absolute value of the bias of the returned approximate 
median, = \X^ — M.d{n)\ (where Aid{n) is the true median, (n + l)/2). We 
compute its mean (Avg.) and standard deviation, denoted by ad- 

A measure of the improvement of the selection effectiveness with increasing (initial) 
array size n is seen from the variance ratio Od/ M.d{n). This ratio may be viewed as a 
measure of the expected relative error of the approximate median selection algorithm. 

Numerical computations produced the numbers in Table 1; note the trend in the 
rightmost column. (This trend is the basis for the approach examined in Section 4.4 of 
[BCC*99].) 



n 


r = logg n 


Avg. 


0~d 


(Td/Vn 


9 


2 


0.428571 


0.494872 


0.164957 


27 


3 


1.475971 


1.184262 


0.227911 


81 


4 


3.617240 


2.782263 


0.309140 


243 


5 


8.096189 


6.194667 


0.397388 


729 


6 


17.377167 


13.282273 


0.491958 


2187 


7 


36.427027 


27.826992 


0.595034 



Table 1. Statistics of the median selection as function of array size. 
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Fig. 4. Plot of the median probability distribution for n=243. 



5. Experimental Results 

In this seetion we present empirical results, demonstrating the effectiveness of the algo- 
rithms - also for the cases which our analysis does not handle directly. Our implemen- 
tation is in standard C (GNU C compiler v2.7). All the experiments were carried out on 
a PC Pentium 11 350Mhz with the Linux (Red Hat distribution) operating system. The 
lists were permuted using the pseudo-random number generator suggested by Park and 
Miller in 1988 and updated in 1993 [PM88]. The algorithm was run on random arrays 
of sizes that were powers of 3, n = S’", with r e {3, 4, . . . , 11 }, and others. The entry 
keys were always the integers 1, . . . , n. 

The following tables present results of such runs. They report the statistics of Dn, 
the absolute value of the bias of the approximate median. For each returned result we 
compute its distance from the correct media The units we use in the tables are 
“normalized” values of Dn, denoted by d%\ these are percentiles of the possible range 
of error of the algorithm: d% = 100 X j 2 ■ The extremes are d% = 0 when the 
true median is returned - and it would have been 100 if it were possible to return the 
smallest (or largest) elements. (But relation (2) shows that d% can get arbitrarily close 
to 100 as n increases, but never quite reach it). Moreover, the probability distributions 
of the last section suggest, as illustrated in Figure 4, that such deviations are extremely 
unlikely. Table 2 shows selected results, using a threshold value of 8. All experiments 
used a sample size of 5000, throughout. 
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n 


Avg. 


a 


Avg. + 2 a 


Rng(95%) 


(Min-Max) 


50 


10.27 


7.89 


26.05 


24.49 


0.00^4.90 


100 


8.45 


6.63 


21.70 


20.20 


0.00^8.48 


3® 


6.74 


5.13 


17.00 


16.53 


0.00-38.02 


500 


5.80 


4.41 


14.63 


14.03 


0.00-29.06 


3® 


4.83 


3.71 


12.26 


12.09 


0.00-24.73 


1000 


4.70 


3.66 


12.02 


11.61 


0.00-23.62 


3^ 


3.32 


2.54 


8.41 


8.05 


0.00-16.83 


5000 


2.71 


2.10 


6.91 


6.72 


0.00-17.20 


3® 


2.31 


1.75 


5.81 


5.67 


0.00-11.95 


10000 


2.53 


1.86 


6.24 


6.04 


0.00-11.38 


3® 


1.58 


1.18 


3.94 


3.86 


0.00-6.78 



Table 2. Simulation results for d%, fractional bias of approximate median. Sample size 
= 5000. 



The columns are: n - array size; Avg. - the average of d% over the sample; a - the 
sample standard-error of d%; Rng. (95%) - the size of an interval symmetric around 
Avg. that contains 95% of the returned values; the last column gives the extremes of d% 
that were observed. In the rows that correspond to those of Table 1 , the agreement of the 
Avg. and a columns is excellent (the relative differences are under 0.5%). Columns 4 
and 5 suggest the closeness of the median distribution to the Gaussian, as shown above. 

All the entries show the critical dependence of the quality of the selection on the 
size of the initial array. In the following table we report the data for different values of 
n with sample size of 5000, varying the threshold. 







t= 


8 


1 7=26 


1 7=80 1 


n 


Avg. 


a 


Rng. (95%) 


Avg. 


a 


Rng. (95%) 


Avg. 


a 


Rng. (95%) 


100 


8.63 


6.71 


22.22 


6.80 


5.31 


16.16 


4.64 


3.61 


12.12 




5.80 


4.41 


14.43 


4.40 


3.30 


10.82 


3.22 


2.40 


7.62 


1000 


4.79 


3.67 


12.01 


3.80 


2.88 


9.41 


2.98 


2.27 


7.41 


10000 


2.54 


1.87 


6.05 


1.67 


1.28 


4.14 


1.40 


1.06 


3.44 



Table 3. Quality of selection as a function of threshold value. 



As expected, increasing the threshold — the maximal size of an array which is sorted, 
to produce the exact median of the remaining terms — provides better selection, at the 
cost of rather larger processing time. For large n, threshold values beyond 30 provide 
marginal additional benefit. Settling on a correct trade-off here is a critical step in tuning 
the algorithm for any specific application. 

Finally we tested for the relative merit of using quintets rather than triplets when 
selecting for the median. In this case n = 1000, Threshold=8, and sample size=5000. 
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Avg. 


(J 


Avg. + 2 a 


Rng(95%) 


(Min-Max) 


Triplets 


4.70 


3.66 


12.02 


11.61 


0.00-23.62 


Quintets 


3.60 


2.74 


9.08 


9.01 


0.00-16.42 



Table 4. Comparing selection via triplets and quintets. 



6. Conclusion 

We have presented an approximate median finding algorithm, and an analysis of its 
characteristics. Both can be extended. In particular, the algorithm can be adapted to 
select an approximate -element - for any k G [1 , n] . The analysis of Section 4 can be 
extended to show how to compute with the exact probabilities, as given in equation (7). 
Also, the limiting distribution of the bias D with respect to the true median - while we 
know it is extremely close to a gaussian distribution, we have no efficient representation 
for it yet. 
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Abstract. A resource-free characterization of some complexity classes 
is given by means of the predicative recursion and constructive diagonal- 
ization schemes, and of restrictions to substitution. Among other classes, 
we predicatively harmonize in the same hierarchy PTIMEF, the class £ of 
the elementary functions, and classes DTIMESPACEF(n^, n'*). 

Keywords: time-space classes, implicit computational complexity, ele- 
mentary functions. 



1 Introduction 

Position of the problem. The standard definition of a complexity class involves 
the definition of a bound imposed on time and/or space resources used by a 
Turing Machine during its computation; a different approach characterizes com- 
plexity classes by means of limited recursive operators. The first characterization 
of this type of a small complexity class was given by Cobham [8], who showed 
that the polytime functions are exactly those functions generated by bounded re- 
cursion on notation] however, a smash initial function was used to provide 
space enough. 

Leivant [12] and Bellantoni & Cook [2] gave the characterizations of ptimef; 
several other complexity classes have been characterized by means of unlimited 
operators (see [13,5] for pspacef, [4] for ptime, pspace (languages), PH and 
its elements, [1,3] for AfV, [14] for pspacef and the class of the elementary 
functions, [7] for the definition of a time-space hierarchy between PTIMEF and 
pspacef). All these approaches have been dubbed Implicit Computational Com- 
plexity. they share the idea that no explicitly bounded schemes are needed to 
characterize a great number of classes of functions and that, in order to do this, it 
suffices to distinguish between safe and unsafe variables (or, following Simmons 
[17], between dormant and normal ones) in the recursion schemes. This dis- 
tinction yields many forms of predicative recurrence, in which the being-defined 
function cannot be used as counter into the defining one. 

Statement of the result. We define a safe recursion scheme SREC on a ternary 
word algebra, such that f{x,y,za) = h{f{x,y,z),y,za), where x,y,z are, re- 
spectively, the auxiliary variable, the parameter and the recursion variable; no 
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other type of variables can be used, and the identification of 2 ; with x is not 
allowed. We also define a constructive diagonalization scheme cdiag, such that 
/(n) = {e(n)}(n), where {m} is the standard Klenee’s notation for the function 
coded by m and e is a given enumerator. 

Starting from a characterization T\ of LINTIMEF, and from an assignment of 
fundamental sequences for ordinals A < eo (see [16] for further details on 
ordinals), we define the hierarchy {Ta}a<eo as follows: 

1. at each successor ordinal, is the class of all functions obtained by one 
application of safe recursion to functions in T ^, ; 

2. at each limit ordinal A, T\ is the class of all functions obtained by one 
application of constructive diagonalization in an enumerator e G T\-^, such 
that {e(n)} G T\^. 

Given an ordinal a in Cantor normal form, Ba{n) is the max(2, where 

clps{a,n) is the result of replacing a; by n in a. We have that: 

1. for all finite k, Tk =DTlMEF(n^); 

2. for all W < a < eo, DTIMEF(i?a(n)) QTa C DTIMEF(i?a(n + 0(1))). 

Thus, U/ 3 <w =PTIMEF and U/3<eo the elementary functions. 

In analogy with {Ta}a<eo we define a hierarchy {5o,}a<eo of space-increasing 
functions and, by means of a restricted form of substitution, we define a time- 
space hierarchy {TSqp}qp<^^, such that 
3. T Sqp =DTIMESPACEF(n^’,n'^). 



2 Costants, Basic Functions, and Definition Schemes 

2.1 Recursion Free Functions 

T is the ternary alphabet {0, 1,2} . p, q, . . . , s, . . . are the word 0 or words over 
T not beginning with 0; t is the empty word. B is the binary alphabet {1,2}. 
U,V, . . . ,Y are words over B. a,b,a\,. . . are letters of T or B. 

The i-th component {s)i of a word s of the form WOW_iO . . . OF 2 OT 1 is The 
rationale of this definition is that ternary words are actually handled as tuples 
of binary words, with the zeroes playing the role of commas, jsj is the length of 
the word s. 

We denote with x, y, z variables used as, respectively, auxiliary, parameter and 
principal in the construction of the current function, /(s, t, r) is the result of 
assigning words s,t,r to x,y,z. By a notation like f{x,y,z) we always allow 
some among the indicated variables to be absent. 

Definition 1. Given i> 1, a = 1, 2 and u = x,y, z, the initial functions are the 
following unary functions: 

1. the identity i(m), which returns the value s assigned to u; sometimes we write 
s instead of i(s); 
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2. the constructor c“(m), which, when s is assigned to u, adds the digit a at 
the right of the last digit of {s)i, ; it leaves s unchanged if {s)i is a single 
letter; 

3. the destructor Di(s), which, when s is assigned to w. (a) erases the rightmost 
digit of (s)i, if {s)i is not a single digit; (b) returns 0, otherwise. Constructors 
and destructors leave s unchanged if it has less than i components. 

Example 1. cj(12022) = 120221; D2(1010) = 100; D2(100) = 100. 

Definition 2. Given i>l and 6 = 1,2, we have the following simple sehemes: 

1. / =lDT2,(g) is the result of the identification of a; as y in y; 

2. / =YD'iz{g) is the result of the identification of z as y in y; 

3. / =ASG„(s,y) is the result of the assignment of s to the variable u in y; 

4. / =BRANCH^(y, h) is defined by branehing in y and h if for all s, t, r we have 
f{s,t,r) = if the rightmost digit of {s)i is b then g{s,t,r) else h{s,t,r). 

Example 2. f =iDT2,(y) implies f{t,r) = g{t,t,r). Similarly, / =iDT2(y) implies 
f{s,t) = g{s,t,t). Let s be the word 110212, and / =BRANCH2(y, fi); we have 
/(s, t, r) = y(s, t, r), since the rightmost digit of (5)2 is 1. 

Definition 3. A modifier is the sequence composition of n constructors and m 
destructors. 

Definition 4. Class 7 q is the closure of modifiers under BRANCH^ and compo- 
sition. 

Definition 5. 1. The rate of growth rog{g) of a modifier y is n — m, where n 

and m are respectively the number of constructors and destructors occurring 
in y. 

2. For all / G To built-up by means of some branchings from modifiers yi,. . . ,gk, 
we have rog{f) := maxi<fc rog{gi). 

Definition 6. Class 5o is the class of functions in Tq with non-positive rate of 
growth, that is = {/ € %\rog{f) < 0}. 

Notice that all functions in Tq are unary, and they modify their inputs according 
to the result of some test performed over a fixed number of digits. Functions in 
5o, or their iteration, cannot return values longer than their input. 

2.2 Safe Recursion and Diagonalization 

Definition 7. / = SREc(y, h) is defined by safe recursion in the basis function 
g{x, y) and in the step funetion h{x, y, z) if for all s, t, r we have 

(f{s,t,a) =g{s,t) 

\ f(s, t, ra) = h{f{s, t, r),t, ra). 
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In particular, / =iter(/i) is defined by iteration of function h{x) if for all s, r 
we have 

f/(s,a) =s 
\f{s,ra) = h{f{s,r)). 

We write for iter(/i)(s, r) (i.e. the \r\-th iteration of h on s). 



Definition 8. / =CDiAG(e) is defined by constructive diagonalization in the 
enumerator e if for all s, t, r we have 

f{s,t,r) = {e{r)}{s,t,r) 

where {m} denotes the function coded by m. 



Definition 9. / =CMP{h, g) is defined by composition of h and g if f{u) = 
g{h{u)), with h or g in %. 

Definition 10. Class Tf (resp. 5i) is the closure under simple schemes and cmp 
of functions obtained by one application of iter to % (resp. So)- 

Note that since identification of 2 ; as a; is not allowed (see definition 2), the step 
function cannot assign the previous value of the function being defined by SREC 
to the recursion variable. Thus, we obtain that 2 is a dormant variable, according 
to the Simmons’ approach (see [17]), or a safe one, following Bellantoni&Cook: 
we always know in advance the number of recursive calls of the step function, 
and this number will never be affected by the previous values of /. 

Example 3. By a sequence of SREC’s we now define a sequence of functions g^ 
which, at m, compute in unary m", that is, such that \gn{a,t)\ < \t\^. We then 
use the fact that the generation of the gnS is uniform in n to define by CDIAG a 
function which computes in unary m™. That is: 

Ql ■ — CMP(ITER(C]^ ),D]^ ), fn-\-l *~SltEC(^j^, gn-Vl 

We have 

fgi(s,a) = gn{s,t) (fn+i{s,t,a) = gn{s,t) 

\gi(s,ra) = cj(gi (s,r)) ’ \fn+i{s,t,ra) = gn{fn+i{s,t,r),t). 

By induction on n and r one sees that we have |/n+i(s, t, r)| = jsj + jtl^lrj and, 
therefore, \gn{a,t)\ < \t\'^. Assume now defined a function e E Ti such that 
e(r) g\r\ ^ . If we define :=CDiAG(e), we have |/(^(a, t,t)\ = g\t\{a, t) = 



2.3 Standard Computability 

As model of computation we use the push-down Turing machine, and we give 
the definition of inclusion between classes of TM’s and classes of functions. 
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Definition 11. A binary push-down Turing machine M is defined as follows: 

— M has k push-down tapes over the alphabet B and m + 1 states (0 denotes 
the final state, 1 the initial state); 

— the description of M consists of m rows of the type 

Ri = {i,j{i),h,ji,h,i2,j2,h,h), 

(one for each non-final state) where: 

(i) i,ii,i 2 ,is are states, with f yf 0; 

(ii) j{i),ji,j 2 are tapes, with j a given function; 

(iii) Ii,l 2 are defined on the set of instructions {pop, push l,push 2} 

— each row of M should be intended as 
if the current state is i then 

if top{j{i)) = 1 then enter i\ 

apply Ji to ji; 

if top{j{i)) = 2 then enter i 2 

apply h to j 2 ; 
if is empty then enter 

Given a push-down TM M with k tapes, Ti = X means that the content of tape 
Ti is the binary word X. 

Note that the previous model and the ordinary Turing machine model are equiv- 
alent, with respect to the order of time needed to compute a given function. In 
fact, let M be an ordinary TM, with n tapes unlimited to the left, alphabet B 
and a fixed set of states. M can be simulated by a push-down TM N with 2n 
tapes and the same number of states, which stores the content of the f-th tape 
of M at the left of the observed symbol in its tape 2i-l, and the part at the right 
in its tape 2i. If M moves left on tape i, N pops a symbol from tape 2i-l and 
pushes it into tape 2i; similarly, if M moves right on tape i, N pops a symbol 
from tape 2i and pushes it into tape 2i-l. 

Definition 12. 1. A push-down Turing machine M, by input s = XiO . . . OXn, 

standard computes q = Yi0...0Fm {M{s) =sc q) if starts with Ti = Xi 
(1 < f < n) and stops with Tj = Yj {I < j < m). 

2. M standard computes the function / (M =sc /) if /(s) = q implies that 
M{s) =sc q- 

It is natural to observe that the number of tapes of the Turing machine which 
standard computes a function must be independent from the number of compo- 
nents of its input; following the previous definition, a new Turing machine should 
be defined for each possible number of components of the input of /. However, 
when we are defining a TM that standard computes a function /, we need a 
number of tapes that depends only on the maximum number of components of 
the input that / can manipulate or check with a constructor, a destructor or a 
branching. 

Definition 13. 1. Given / G Ti, we define the number of components of / 

(denoted with ff{f)) as max{i\Di or a c“ or a branch)’ occurs in /}. 




244 Emanuele Covino, Giovanni Pani, and Salvatore Caporaso 



2. Given a function /, we define the length of / (denoted with lh{f )) as the 
number of destructors, destructors and defining schemes occurring in its 
construction. 



Definition 14. Given the class of time-bounded push-down TM’s DTiME(p(n)) 
and given C a class of functions, we say that 

1. DTiME(p(n)) C C if for all M GDTiME(p(n)) there exists a function f E C 
such that M(s) =gc /(s); 

2. C CDTiME(p(n)) if for all f E C there exists a push-down TM M with #(/) 
tapes such that, for all s, t, r, M returns f{s, t, r) in time p(|s| + |t| + |r|). 

Let U be the finite alphabet that we use to write our functions; the code of a 
function / is obtained by concatenation of the codes for the letters of U which 
compose /; the arity associated with each letter ensures unique parsing. 

Definition 15. The code \L\ of the f-th letter L of U is 2*+^l. If the arity of 
L € U is n then \En \ . . .\E{\\L\ codes the expression LE\ . . .En- We write 
{E^,...,E^) for \E^-\...\E{\. 

In the same way we can define the code of a push-down TM. 

Definition 16. Let M be a binary push-down TM with k tapes and m + 1 
states. 

1. The code of the row Ri = (1 < f < m) is the 

word 

2. The code of M is the word (i?i, . . . , Rm)- 

3. An instantaneous description of M is coded by the word Xi, . . . , Xkstate, 
where each Xi encodes the z-th tape of M, and state encodes the current 
state. 

Lemma 1. T\ =DTlMEF(n). 

Proof. We first show (by induction on the construction of /) that each function 
f eT\ can be computed by a push-down tm in time lh{f)n. Base, f E%. The 
result follows from the definition of initial functions and CMP (see definition 4). 
Step. Case 1. / =iter((;), with g E %. We have /(s, r) = f;l’"l(s). A push- 
down TM Mf with #(/) + 1 tapes can be defined as follows: with (s)i on tape 
f (1 < f < #(/)) it computes g (in time lh{g)) and, after |r| repetitions, it 
stops returning the final result. Thus, Mf standard computes /(s, r), within 
time \r\lh{g). 

Case 2. Let / be defined by branching or CMP. The result follows by direct 
simulation of the schemes. 

In order to prove the second inclusion, we show that the behaviour of an m-tape 
TM M (with linear time bound cn) can be simulated by a function in Ti. Indeed 
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a function uxIm can be defined in Tq, which uses two components for each tape 
of M , one for the part at the left of the observed symbol, one for the part at 
the right (read in reverse order); the internal state is stored in the (2m + 1)- 
th component. nxtM{s) has the form if state[i]{s) and top[b]{j{s)) then En,, 
where: (a) state[i]{s) is a test which is true iff the state of M is coded by i; (b) 
top[b]{j (s)) is a test which is true iff the observed character on tape j{s) is b; 
(c) Eib is a sequence of modifiers which update the code of the state and part of 
the tape, according to the definition of M. By means of c — 1 CMP’s we define 
in To the function nxfy, which applies c times the functions nxtM to the word 
that encodes an istantaneous description of M . Define now in T) 

j linsimM{x,a) = x 
\ linsimM(x, za) = nxVyilinsinriMix, z)) 

We have that linsimM{s,t) iterates nxtM{s) for c|t| times, returning the code 
of the ID of M which contains the final result. 

3 The Hierarchies 

3.1 Ordinals 

In this section we define a hierarchy of functions {Ta}a<eo, starting from T), by 
means of closures under safe recursions and diagonalizations, such that is 
defined by one application of safe recursion to Ta, and T\, for the limit ordinal 
A, is obtained by diagonalization on classes . 

In the rest of this paper greek small letters are ordinal numbers, with A, p limit 
ordinals; A„ is the n-th element of the fundamental sequence assigned to A. 
Recalling the definition of the standard assignment of fundamental sequences 
for all A < to (cfr. [16], page 78), we introduce a slightly modified assignment. 

' n if A < 

^ if Cantor normal form for A is ui^ 

" if Cantor normal form for A is 

p + (uJ°^)n if Cantor normal form for A is p + 

We now define a hierarchy of functions Ba{n) := max(2, where 

clps{a,n) is the result of replacing a; by n in the Cantor normal form of a. By 
simple computation one can see that Bm{n) = n™, B^c{n) = (n) = n” , 

Buj^{n) = n” (c times). 

Definition 17. Given 1 < a < cq and A a limit ordinal, 

1. Ta^i is the closure under the simple schemes and cmp of the class of functions 
obtained by one application of SREC to Ta- 

2. T\ is the closure under the simple schemes and cmp of the class of functions 
obtained by CDiAG(e), where e € T\-^, {e(r)} e 
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Example 4- In the example 3 we have defined a sequence of functions g^, with 
n > 1; according to definition 17 one can easily verify that gn G %i, and that 



Definition 18. Given 1 < a < eo and A a limit ordinal, 

1. 5q,+i is the closure under the simple schemes of the class of functions ob- 
tained by one application of SREC to Sa ■ 

2. S\ is the closure under the simple schemes of the class of functions obtained 

by CDiAG(e), where e € {e(r)} G 5 a|^| . 



Definition 19. / =WSBST{h, g) is defined by weak substitution of h in g if 
f{x,y,z) = g{h{x,y,z),y,z). 



Definition 20. For all positive p,q, TSqp is the class of functions defined by 
weak substitution of h in g, with h & Tq, g e Sp and q < p. 



Theorem 1. 1. For all finite k, =DTlMEF(n^). 

2. For all CU < a < Cq, DTIMEF(i?a(n)) c Ta Q DTIMEF(i?a(n + 4)). 

3. TiSgp =DTIMESPACEF(n^’,n'^). 

Proof. 1. Let / G Tk- In lemma 2 a TM which interprets the function / is defined, 
and its runtime is proved to be in DTiMEF(n^). Let M be a TMin DTiMEF(n^). 
By lemma 3 we have that the iteration times of nxtM can be defined in 7^; 
this function, starting from the code of the initial configuration of M, returns 
the code of its final configuration. 

2. The first inclusion follows by lemma 3, and the second by lemma 2 

3. The two inclusions follow by lemma 6 and lemma 5 

The following results immediatly follow from theorem 1. 

Corollary 1. 1. DTIMEF(n") C CDTIMEF((n + 4)("-+"^)). 

2- =PTIMEF. 

3- U/3<eo ^ 

4 Proofs 

4.1 Simulation of Functions by TM’s 

Lemma 2. For all a> u we have Cdtimef ( i?Q,(n + 4)). 

For all finite m, we have Tm CDTiMEF(i?m,(’T-))- 
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Proof. In what follows we will define a. TM I NT, which interprets the input 
\f],s,t,r, returning the value of / applied to its arguments. Given a function 
f E Ta, we need to know, while designing I NT, the number of tapes it uses. In 
order to do this, define d such that (a) |s| + |t| + |r| < d or (b) no CDIAG occurs 
in / and #(g) < d, for each g eT\ occurring in the definition of /. This means 
that the parts of the arguments which can be modified (and thus the number 
of tapes of INT) depend on d. At the end of this proof we reduce INT to a 
two-tape TM with a logarithmic increment in the time bound. 

The interpreter INT uses the following stacks: 

(a) T®, T^, T^, to store the values of x, y, z during the computation; each of them 
consists of d tapes, one for each of the modifiable parts of the value assigned to 
x; their initial values are, respectively, s, t, r; 

(b) T“, to store the value of the principal variable of the current recursion; 

(c) Tf, to store (the codes of) some sub-functions of /; its initial value is the 
code of /; 

INT repeats, until Tj is not empty, the following cycle: 

- it pops a function k from the top ofTf, and un- nests the outermost sub-function 
j of k; 

- according to the form of j, it carries-out different actions on the stacks; 

- if the form of j is iTER(g) with g E Tq, it calls an interpreter ITER for 7) 
which simulates g on T® for |t| times, where t is on the top of T^; 

- in all other cases, it pushes into Tj an information of the form j MARK k, 
where MARK informs about the outermost scheme used to define j. 

Thus we define 



INT{\f],s,t,r):= 

Tf.= r/1; T":=s; := t; K := r; 

while Tf not empty do A := last record(s) of Tf, 

case 

A = CMP[g, h) then push g h into Tf 

A = ASGx(p,g) then push g into T-^; copy p into 

A = IDTx{h) then push h in Tf, copy last record of into 

A = BRANCHi{g, h) then pop Tf, 

if top{{T^)i) = b then push g into Tf 
else push h into Tf 

then push DG h into Tf, copy last record of into T“ 
then pop Tf, pop last record of and push it into Tf, 
pop last record of T“ and push it into 
then push A RG g into Tf, copy last record of K into T“; 

push last digit of T“ into 
then if T“ = then pop Tf, pop T“; pop 

else push h into Tf, push last digit of T“ into 
then call ITER. 

end case; 
end while. 



A = DIAG(h) 

A = DG 

A = SREG{g,h) 

A = SREG{g, h) RG 

A = ITER{g) 
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We now show that for all f,s,t,r respecting the imposed condition over d we 
have 

f eTa^ INT{\f\,s,t,r) = f{s,t,r) within time |s| + \ \f]\Ba{\t\ + |r| + !„) 

where lQ, = 0ifo;<cj and Iq, = 1 otherwise. The result follows, since every 
function / G 7^ is then computed in DTlMEF(i?„(n + !„)) by the composition 
of the constant-time TM writing the code of / with INT. 

Define m := |s|, n := \t\ + |r|, c := |[/]|. We show that, for all / € T^, INT 
moves within m + cBain + !«) steps from an istantaneous description of the 
form 

Tf = Z\f\, T- = sos; Ty = tot; = ror; T“ = q, 
to a new istantaneous description of the form 

Tf = Z;T- = sofis, t, r); = tot; T^ = ror; T“ = q. 

Induction on the construction of /. Basis. Let f £ Ti. We have Iq, = 0. The 
complexity of ITER is obviously bounded by m + cn. 

Step. Case 1. / =SREc(f;, h). We have a = /? -h 1; let r be the word op| . . . ai. 
By the inductive hypothesis, INT needs time <m+\\g] \B/^{n+la) to produce 
the istantaneous description 

Tf = Z\f^ RC; T" = sog{s,t,ai); = tot; T^ = rorai; T“ = qr. 

If |r| > 1 then INT puts Tf := Z [/] RC \h\ and := rora 2 a\, and calls 
itself in order to compute h and the next value of /. By the inductive hypothesis 
we have that INT needs time < | 5 r(s, t, ai)| + + !„) to produce an 

istantaneous description of the form 

Tf = Z\n RC; T^ = qr; 

T^ = so{h{g{s,t,ai),t,a2ai)); Ry = tot; T^ = rora2ai. 

After |r| simulations of h we obtain the promised istantaneous description within 
an amount of time 

m + |r| maxdfg] |, \ \h] \)B/ 3 {n + !„) < m + \r\cB/ 3 {n + !„) < m + cBa{n + !„) 

where, since a > 2, in these evaluations we may compensate the quadratic 
amount of time needed to copy r and its digits with the difference between c 
and max(|[fir]|,||'/i]|). 

Case 2. / =cdiag(/i) G T\. We have h G T\-^ and (recall that Ai = 1 when 
A < 



B\^ (n + 1) + (^ + 1) < ^A„+i (^ + 1) — B\(n +1). (1) 

INT computes h{r), understands from the mark DG that the result is the code 
for the function to be computed, and, accordingly, pushes it into Tf. 
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To compute h{r) and {h{r)}{s,t,r) the interpreter INT needs, by the inductive 
hypothesis, time m + | |"/i] |i?Ai (Ifl + 1) + + 1) < (by (l))m + 

\\h] \B\{n + 1) < m + cB\{n + 1). 

Reduction to two tapes. The interpreter we have just defined uses a number 
of tapes d that depends on the construction of the simulated function. We use 
now the general procedure showed in [9] to reduce a /c-tape TM bounded by T(n) 
to a 2-tape TM bounded by kT{n) log T(n); thus, we obtain a 2-tape interpreter 
bounded by 

T(n) = kBain + 1) log Bain + 1) < kBa(n + 2) log(n + 1) < Ra(n + 4) 



4.2 Simulation of TM’s by Fnnctions 
Lemma 3. For all 1 < a < cq, dtimef (i?Q,(n)) C Ta- 

Proof. Let M be a TM in dtimef ( i?Q,(n)). There exists a function nxtM C Tq 
such that, for input the code of an istantaneous description of M, nxtM returns 
the code of the next description. We define the following function: 

{ s\nxtM'\ if a = 0 

u(s, I"/?] )u(s, |~/3] ) |~SREC] if a = /? + 1 

cr(s, |~q;i])(t(s, |~Q;p|] ) |~SREc] [cdiag] if q; is a limit ordinal 

We prove by induction on a that the function whose code is generated by o is 
in To,. 

Base. 0 = 0. We have that nxtM <= Tq, by hypothesis. 

Step. Case 1. a = j3+l. We have that {cr(s, [a])} =SREc(cr(s, |~/?]),cj(s, |~/?])). 
The function is in by induction on a and definition 17. 

Case 2. a = A. We have that {cr(s, |~A] )} = cdiag(srec((t(s, |~Ai] ), cj(s, rA|s|] ))). 
This function is in T\, since {cr(s, (Ai])} C and, for all s, {(^(s, rA|g|])} e 




The function a writes the code of a function which iterates nxtM for Ba{n) 
times; by input the code of the initial configuration of M, this function returns 
the codes of the final configuration. 

Note that, given the code of a limit ordinal A, we need at most a quadratic 
amount of time to return the code of A„. 

4.3 Time-Space Classes 

Lemma 4. For all / in Sp, we have |/(s, t, r)| < max(|s|, |t|, |r|). 

Proof. By induction on p. Base, f & Si. 

Case 1. / is defined by iteration of a function gr in 5q; we have, by induction on 
C l/(s,a)| = |s|, and |/(s,ra)| = \g{f{s,r))\ < |/(s,r)| < max(|s|, |r|). 

Case 2. / is defined by simple scheme or CMP. The result follows by the inductive 
hypothesis. 
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Step. Given / G defined by SREC in functions g and h m Sp, we have 

\f{s,t,a) \ = \g{s,t)\ by definition of / 

< |max(|s|, |t|)| by inductive hypothesis. 



and 

|/(s, t, ra)\ = \h{f{s, t, r), t, ra)\ by definition of / 

< I max(|/(s, t, r)|, |t|, |ra|)| by inductive hypothesis on h 

< I max(max(|s|, |t|, |r|), |t|, |ra|)| by induction on r 

< jmax(|s|,|t|,|ra|)|. 

Lemma 5. T5gp CDTiMESPACEF(n^’, n"?). 

Proof. Let / be a function in TSqp. By definition 20, / is defined by weak 
substitution of a function h e Tq into a function g G Sp, that is, f{s,t,r) = 
g{h{s, t, r), t, r). The theorem 1 states that there exists an interpreter INT com- 
puting the values of h within time rP , and computing the values of g within time 
riP. The lemma 4 holds for g, since g belongs to Sp] thus, the space needed by 
INT to compute g is at most n. 

Define now a TM M that, by input |~/] , s, t, and r performs the following steps 
(recall lemma 2 for the definition of INT): 

(1) it pushes {g) \h\ into the tape T^ of INT, which contains the codes of the 
functions that the interpreter will compute; 

(2) it calls INT on input \g) [/i] , s, t, r. 

The time complexity of (1) is linear in the length of |~/]; in (2), INT needs time 
equal to n'J to compute h, and needs only (and not n‘^^ ) to compute g. This 
happens because h{s, t, r) is computed in the safe position, and this implies that 
its length does not affect the number of steps performed by the second call to 
INT. In fact, INT never moves the content of a safe position into the tapes 
whose values play the role of recursive counters; they depend only on n, the 
length of the original input. Thus, the overall time bound is n'^ + n^, which can 
be reduced to n^, being q < p. 

INT requires space n'J to compute the value of h on input s, t, r; as we noted 
above, the space needed for the computation of g is linear in the length of the 
input, and thus the overall space needed by M is still rP. 

Lemma 6. DTlMESPACEF(nP, n^) C T Sqp 

Proof. Let M be a TM in DTiMESPACEF(n^, n'^). This means that the compu- 
tation of M is time-bounded by n'^ and, simultaneously, it is space-bounded by 
nP. M can be simulated by the composition of two TM’s, Mg and Mh, with 
Mh GDTlMEF(n'^) and Mg GDTlMESPACEF(n^’, n): the former constructs (within 
polynomial time) the space that the latter will successively use in order to sim- 
ulate M . 

By theorem 1 there exists a function h E Tq which simulates the behaviour of 
Mh, and there exists a function g E Sq which simulates the behaviour of Mg] in 
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particular, there exists the function nxtg G %. Note that nxtg belongs to Sq, 
since it never adds a digit to the description of Mg without erasing another one. 
We define the function ctq :=lTER(na;tg), and the sequence 7n+i :=SREC((t„, u„), 
with an+i :=IDT 2 ( 7 „+i). 

We have that 

J 7 i (s, t, a) = nER{nxtg ){s,t) 

\ji{s,t,ra) = iTER{nxtg ) (71 (s, t, r) , t) 

and 

i Jn+i{s,t,a) = an{s,t) 

l, 7 n+i (s,t,ra) = an{jn+i{s,t,r),t,ra) 

We can easily see that co G iSi, by definition of this class and, again by definition 
18, that an G iS^+i- 

Given the code s of the initial id of M, we can define simM{s) = (Tp_i(/i(s), s), 
which simulates the behaviour of M . This function is defined in TSgp. 
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Abstract. If several keys are inserted into a search structure (or deleted 
from it) at the same time, it is advantageous to sort the keys and per- 
form a group update that brings the keys into the structure as a single 
transaction. A typical application of group updates is full-text index- 
ing for document databases. Then the words of an inserted or deleted 
document, together with occurrence information, form the group to be 
inserted into or deleted from the full-text index. In the present paper 
a new group update algorithm is presented for red-black search trees. 
The algorithm is designed in such a way that the concurrent use of the 
structure is possible. 



1 Introduction 

If a large number of keys are to be inserted into a database index at one time, 
then it is important for efficiency that the keys are sorted and inserted into the 
index tree as a group. In this way, it is not necessary to traverse the whole path 
from the root to the leaf when performing each insertion. 

Typical applications where group operations are needed are databases in 
which large collections of data are stored at one time, such as document databases 
or WWW search engines [13, 14]. In such systems, full-text indexing is applied 
for term search. In the inverted-index technique, for each index term an occur- 
rence list is created that includes all documents the term appears in. The terms 
themselves are organized as a search structure, typically as a tree. When insert- 
ing a new document, an update of the inverted index is required for each term in 
the inserted document [4, 5]. This can be a long transaction, and without con- 
currency intolerable delays can occur if searches in the database are frequent. To 
overcome this problem concurrent group update algorithms have been designed 
[13]. 

Other applications arise, e.g., when updates occur in bursts and are collected 
into groups that are merged in certain intervals with the main index [7], and 
when large indexes are constructed on-line, i.e., during the construction of the 
index, new records may be inserted into the file to be indexed [11, 16]. To finish 
the indexing, a group update is created from the updates which occurred during 
the main construction. 



G. Bongiovanni, G. Gambosi, R. Petreschi (Eds.): CIAC2000, LNCS 1767, pp. 253-262, 2000. 
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In this paper we present an efficient group update algorithm for red-black 
search trees [6]. The algorithm has two steps: First, the operations in the under- 
lying group are performed without any balancing, except for subgroups between 
two consecutive keys in the original tree. In this way the updates are made avail- 
able as soon as possible without sacrificing the logarithmic search time. In the 
second step, the tree will be balanced, i.e., transformed into a tree satisfying the 
(local) balance criteria of red-black trees. Balancing is designed as a background 
process allowing the concurrent use of the structure. The balancing time is com- 
parable with earlier results in cases when balancing is strictly connected with 
individual updates. 

Recently, a group insertion algorithm for AVL-trees [1] was presented in [10]. 
The motivation for the new algorithm is two- fold. First, red-black trees are more 
efficient than AVL-trees as regards the number of rebalancing operations needed 
for updates, see [8], for example. Thus, it is important to develop group update 
algorithms for red-black trees, not only for AVL-trees. Moreover, our new imple- 
mentation of group updates for red-black trees has an important advantage over 
the implementation given in [10]. The algorithm given in [10] is based on height- 
valued trees defined in [9]. This implies that even when a subgroup consists of 
a single key and no rebalancing is needed, all nodes in the search path must 
be checked for imbalance and the balance information must be stored. Our new 
algorithm is based on a generalization of red-black trees, called chromatic trees 
[12] or relaxed red-black trees [8], and no unnecessary checking for imbalance 
and restoring balance information is needed. 



2 Chromatic Trees 

We shall consider leaf- oriented binary search trees, which are full binary trees 
(each node has either two or no children) with the keys stored in the leaves. The 
internal nodes contain routers, which guide the search through the tree. The 
router stored in a node v must be greater than or equivalent to any key stored 
in the leaves of v’s left subtree and smaller than any key in the leaves of u’s right 
subtree. 

We define a chromatic tree as a relaxed version of a red-black tree. Instead 
of using the two colors, red and black, we use weights. Each node in the tree 
has a non- negative integer weight. We refer to nodes with weight zero as red 
and nodes with weight one as black. If a node has a larger weight, we call it 
overweighted, and its amount of overweight is its weight subtracted by 1. The 
only requirements in this chromatic tree are that all paths from the root to a leaf 
have the same weight and that leaves have a weight of at least one. 

In a chromatic tree two consecutive red nodes are referred to as a red-red 
conflict (which is assigned to the lower of both nodes), and an overweighted node 
as an overweight conflict. A red-black tree can be defined as a chromatic tree 
without any conflicts. The purpose of the rebalancing operations is to transform 
a chromatic tree into a red-black tree by removing all conflicts. 
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Operations for the updates, insertion, and deletion, as well as for rebalancing, 
can be found in [2, 12]. Note that all operations preserve the tree as a chromatic 
tree, i.e., leaves are not red and all paths have the same weight. 

3 Group Insertion 

The overall structure of an efficient group update for red-black trees is the same 
as for AVL trees [10]. For insertions, this essentially amounts to inserting an 
ordered set of keys, starting at the root of the tree, and propagating subsets 
down the tree to the appropriate insertion locations. For a leaf search tree, we 
arrive at a set of leaves, with a set of keys to be inserted at each of these leaves. 

Thereafter, each set of keys to be inserted is turned into a (usually small) 
red-black tree. With a certain number of rebalancing operations that arise from 
applying the rebalancing scheme of chromatic trees, the balance condition is 
restored. 

As result of a group insertion, a node of the tree may also get negative 
units of overweight. Nodes that have negative weights are called underweighted 
nodes. An underweighted node p corresponds to a red node, the color of which 
additionally stores information about how many black nodes are too much on 
each search path in the subtree rooted at p compared with the rest of the tree. 

The motivation for using underweighted nodes is to expedite the rebalancing, 
since the nodes of a new subtree Ti that is inserted by the group insertion can 
be colored red-black during the creation of Ti without much effort. On the other 
hand, if all internal nodes of Ti are colored red analogously, as by a sequence of 
single insertions, the red-black coloring of Ti must be part of the rebalancing. 
Because the color of at least each second node on a path in Ti must then be 
changed, f2{m) transformations are needed. 

Let T be a chromatic tree of size n that fulfills the balance conditions of 
red-black trees and K a group of m sorted keys. K is split into b subgroups 
Ki {i = 1, . . . ,b) of nii keys so that the search for each of the keys of Ki ends at 
the same leaf h of T. Let h store key ki. 

Operation group insertion; 

Step 1 For all i = 1, ... ,b construct a balanced red-black tree T* that stores 
the nii keys of the fth subgroup and the key ki. The root of Ti gets the 
underweight 1 — Wi, where Wi is the number of black nodes on each path 
from the children of the root to the leaves. 

Step 2 For alH = 1, . . . , 6 replace leaf h of T by the tree T^. Denote the resulting 
tree by T'. 

Step 3 Rebalance T' by small local transformations. 

In order to construct Ti in 0{mi) time, a leaf-oriented AVL-tree [1] is gen- 
erated from the set of rui + 1 keys by applying a simple divide-and-conquer 
algorithm. During the construction the nodes are colored red and black using 
the criterion by Guibas and Sedgewick [6]. 
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In order to rebalance T' basically the set of rebalancing transformations of the 
chromatic tree is used [2]. The underweights only serve for storing information 
about accumulated insertions. During the rebalancing they are transformed into 
red-red conflicts and overweight conflicts again. 

An underweighted node p is handled as follows. If p is the root of the tree, 
then the underweight conflict is resolved by coloring p black (cf. Figure 1, u- 
root). Otherwise, assume that p has weight w{p) = —g, and the sibling q of p 
has weight greater than or equal to —g. Then p is colored red, the weight of p’s 
parent is decreased by g, and the weight of q is increased by g (cf. Figure 1, 
reversal). 



(u-root) 



w < 0 O 



root 



!• 



root 



(reversal) 



0W’i > 0 

-g <0 > -g 



-g 

O +3 > 0 



Fig. 1. Handling of an underweight conflict. Symmetric cases are omitted. (A 
label beside a node denotes the weight of the node. In order to represent the 
colors of the nodes more clearly, additional black or overweighted nodes are filled 
and red or underweighted nodes are unfilled. Half filled nodes have an arbitrary 
weight.) 



If the parent of p is black or overweighted before applying the reversal trans- 
formation, then the reversal transformation resolves at least one unit of under- 
weight from the tree. Otherwise, the underweight is only shifted from p to its 
parent. By a reversal transformation to handle the underweight of p, the sibling 
q oi p may become overweighted, w{q) < g. Furthermore, a constant number of 
red-red conflicts may be generated by a reversal transformation (at the nodes 
which are colored red and at the red children of these nodes) . 

In the following we analyze the costs of rebalancing T' . Let {i = 1, ... ,b) 
be the underweighted nodes of T' with weights w{ri) = —gi < 0. The are the 
roots of the new subtrees Ti created by the group insertion. 

The fth branching node {i = 1, . . . ,k — 1) of a group search path from the 
root to nodes qi,...,qk is defined as the nearest common ancestor of qi and 
qi+i. A stopping node is a branching node, both subtrees of which contain red- 
red conflicts. 

Lemma 1. At most 2 • 9i reversal transformations are needed to resolve 
the gi units of underweight from T' . 
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Proof. The rebalancing of T' is done along the group search path to the nodes 
ri, ... ,ri,. We start with transforming the underweights into red-red and over- 
weight conflicts by using reversal transformations. This transformation is always 
done in bottom-up direction. That means a reversal transformation is applied at 
an underweighted node p only if the subtrees rooted at p and its sibling q do not 
contain any underweighted nodes except for p and q. The question whether those 
two subtrees contain underweighted nodes can be answered locally by marking 
the branching nodes during the search phase of the group insertion and for each 
Ti storing the information as to whether the underweight still exists. 

Since T fulfilled the balance conditions of red-black trees before the group 
insertion, at least every second node on the path from the root to a node Vi is 
black in T'. Therefore, at most 2§i reversal transformations are needed to resolve 
the underweight of r^. 

If the underweights of two nodes ri and r^+i meet at children of a branch- 
ing node, both underweights are handled by the same reversal transformation. 
Thereby one of the underweights is always resolved (cf. Figure 1). □ 



Let us now consider the situation after performing the reversal transforma- 
tions. Since each reversal transformation that handles an underweight which was 
originally created at ri may generate only a constant number of red-red conflicts 
and §i units of overweight, it follows: 

Lemma 2. The number of red-red conflicts generated by the reversal transfor- 
mations is o/ units of overweight is gf). 

In contrast to underweights and overweights, red-red conflicts must always 
be handled in a top-down manner. Thus, the rebalancing of the red-red conflicts 
starts with the top-most red-red conflict on the group search path from the root 
to the nodes n,. . . ,rb. We demand that during the rebalancing of the red-red 
conflicts the following condition must be guaranteed: 

Condition: Whenever the parent or the grandparent of a node has a red-red 
conflict which is handled as a stopping node, then both children of this 
stopping node must be red. 

The Condition is motivated by the following idea: a rebalancing transformation 
should be carried out at a branching node only if either red-red conflicts of 
both subtrees have been bubbled up to this node, so that they can be handled 
together, or if one of the subtrees contains no red-red conflicts any more. 

Lemma 3. The number of transformations needed to resolve all red-red conflicts 
*■5 9 iT L), where L is the number of different nodes on the group search 

path from the root to n , . . . , . 

Proof (sketch). Let k be the number of red-red conflicts. I denotes the number 
of pairs of red siblings where at least one of the siblings lies on the group search 
path from the root to ri, ... ,rb. Let v be the number of nodes p with red- red 
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conflicts that have a distance greater than one to the nearest stopping node that 
is an ancestor of p. We define <P 5k + 21 + v > 0. 

By a simple case analysis it can be verified that, by performing a rebalancing 
transformation to handle a red-red conflict, # is always decreased if, during the 
handling of the red-red conflicts, the Condition is guaranteed. Therefore, at most 
5k + 2l + v rebalancing transformations are needed to resolve all red- red conflicts. 
Since k and v are in 9i)i cf- Lemma 2, and I < L, the claim follows. □ 

Theorem 1. log^ rtii) rotations and log^ mi + L) color changes 

are needed to rebalance T' , where L is the number of different nodes on the group 
search path from the root to n,. . . ,rb- 

Proof. After inserting the subtrees Ti, ... ,Tb into T, the tree contains b under- 
weighted nodes n, . . . , with weights w{ri) = —gi (i = 1, . . . , 6), where gi is in 
O(logmi), since the subtree Ti rooted by contains O(logmi) black nodes on 
each search path. 

First, the Y^i=\9i units of underweight are transformed into 0(X]i=i3i) 
red-red conflicts and Oif^^_^gf) units of overweight (Lemma 2) at at most 
2.Y^i=\9i overweighted nodes by using = OQ^i=ilogmi) reversal 

transformations (Lemma 1). Then the red-red conflicts are resolved from the 
tree by using 0{Yfi=i9i) = rotations and 0{Yl\^.^\ogmi + L) 

color changes (Lemma 3). Finally, the overweight conflicts are handled anal- 
ogously as in Step 2 of a group deletion (cf. the following section). For this, 
0{Yl\=i9‘i) = mi) rotations and 0(J2'i=i9i + color changes 

are necessary, where L* is the number of different nodes on the group search 
path from the root to the overweighted nodes. 

Before handling the red-red conflicts L* < L + gi, because all of the 

at most 2 Yl^i=i 9i overweighted nodes are siblings of nodes on the group search 
path from the root to ri, . . . , r;,. Since each of the 9i) rotations needed to 

handle the red-red conflicts increases L* by at most one, L* is in 0(L + X^i=i 9i)- 
So, the number of color changes needed to resolve all overweights from the tree is 
bounded by 0(X^i=i mi + L). Therefore, after performing step 1 and 2 of a 
group insertion, the tree can be rebalanced by using 0(X^i=i log^rui) rotations 
and 0(X^i=i log^ mi + L) color changes. □ 

Assuming that each path in T from the root to a leaf contains the same 
number of black nodes, it can analogously be shown as for 2-3-trees [3] that the 
number L of different nodes on the group search path is 0(log n + 
where di denotes the number of leaves between f and k+i. 

4 Group Deletion 

Let T be a chromatic tree of size n that fulfills the balance conditions of red-black 
trees and K a group of m sorted keys. K is split into b subgroups Ki {i = 1, . . . ,b) 
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of rtii keys so that for each Ki, the tree T contains a subtree Tj that stores all 
keys of Ki plus an additional key ki at its leaves. 

Operation group deletion: 

Step 1 For alH = 1, . . . , 5 replace the subtree Ti by a leaf U that stores the key 
ki- li gets weight Wi, where Wi is the sum of weights on each path from the 
root of Ti to a leaf. Denote the resulting tree by T' . 

Step 2 Rebalance T' by small local transformations. 

In order to estimate the costs of rebalancing T', first we consider the following 
situation: Let p be an overweighted node, of which the sibling q is the root of 
a balanced subtree that contains no overweights. (After performing a group 
deletion, this situation occurs at p = li, for example, if the sibling q is not equal 
li-\ or li+i respectively.) Denote the parent of p and q by u. Let w{p) = g + 1 > 2 
be the weight of p. 

Lemma 4. At most 2g push transformations plus g further rebalancing trans- 
formations are needed in order to transform T“ into a balanced red-blaek tree 
T“ . Afterwards the root u' ofT^ has weight w{u') < w{u) + 1. 

Proof (sketch). In order to decrease w{p) from g + 1 to 1, g rebalancing trans- 
formations are carried out. Thereby the subtree T“ rooted at u is replaced by a 
subtree T“ rooted by u* (cf. Figure 2). 



w{p) = g + I > 2 




< 2g trans- 
formations 



w{u') < w{u) + 1 
u' 9 

p 1 



O 1 

p o 1 



Fig. 2. Rebalancing T“. 



By observing the weight-balancing transformations, it can be shown that, 
thereby, at most half of the g units of overweight are resolved. The remaining k 
overweight conflicts are spread out over the path from u* to p, i.e., except for 
u*, all nodes on the path from u* to p (of length < 2{g — k) + 2) have a weight 
less than or equal to two (see Figure 2). 

Then the overweights on the path from u* to p are handled in bottom-up 
direction by using at most 2g — k push transformations and k further rebalancing 
transformations. Thereby, T“ is replaced by a subtree T“ rooted by u'. Only 
one of the k units of overweight may remain at u' . 
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Since the set of the g rebalancing transformations that transform T“ into T“ 
contain at most k push transformations, in total at most 2g push transformations 
plus g further rebalancing transformations are needed in order to rebalance T“. 
Afterwards w{u') < w{u) + 1. □ 



Theorem 2. logm^) rotations and + L) color changes 

are needed to rebalance T' , where L is the number of different nodes on the group 
search path from the root to the overweighted leaves h, . . . , It- 

Proof. For i = 1, . . . , 5 let w{li) = gi + 1. 

In order to avoid unnecessary work during the rebalancing, we slightly modify 
the w7 transformation [2], which handles overweight conflicts at sibling nodes. 
Instead of resolving only one unit of overweight as w7 does, w7* resolves the 
maximum number of overweight conflicts at a time (cf. Figure 3). 



w{p) = g + I > 2 




< 2g trans- 
formations 



w{u') < w{u) + 1 
u' 9 

p 1 



O 1 

p O 1 



Fig. 3. Modified (w7)-transformation. 



The rebalancing of T' is always done in bottom-up direction. That means an 
overweight conflict at a node p is handled only if the subtrees rooted at p and 
its sibling q both are balanced red-black trees that do not contain overweighted 
nodes except for p and q. Such a node p always exists in T', if the group deletion 
has not removed all leaves of the tree. At the beginning of the rebalancing, p is 
one of the leaves k and q may be or respectively. 

The question as to whether two subtrees rooted at sibling nodes are bal- 
anced can be answered locally by marking the branching nodes during the search 
phase of the group deletion and by storing for each f the information whether 
overweight-conflicts generated at f still exist in the tree. 

Let p be an overweighted node so that the subtree rooted at p and the subtree 
rooted at p’s sibling q are both balanced. Without loss of generality w{p) > w{q). 
Let u be the parent of p and q. T“ denotes the subtree rooted at u. 

Case 1; [w{q) < 1 and w{p) = 2] One single transformation is sufficient 
either to resolve the overweight conflict at p or to shift it to p’s parent u. 
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Case 2: [w{q) < 1 and w{p) := gp + 1 >2] In this case, the subtree T“ is 
rebalanced analogously as described in the proof of Lemma 4 by using at most 
3§p transformations. Thereby, at least Qp — 1 units of overweight are removed 
from the tree. T“ is replaced by the subtree T“ and the weight of u' is less or 
equal than w{u) + 1. 

Case 3: [w{q) := gq + 1 > 2/ By performing a w7* transformation, the gq 
units of overweight are removed from q, and gq units of overweight are shifted 
from p to p’s parent u. 

The rebalancing of T' is now done as follows. We start at a leaf U and 
apply Case 1 and Case 2, as long as all overweight conflicts are resolved locally 
or, respectively, the child of a branching node is reached. Then the overweight 
conflicts at another Ij are handled analogously. If both children of a branching 
node become overweighted. Case 3 applies. If only one subtree of a branching 
node contains overweights, these overweight conflicts are handled by applying 
Case 1 and Case 2. Then at most one unit of overweight may remain at the 
branching node. In both cases, afterwards an overweight conflict at the branching 
node is handled analogously as at one of the leaves k . 

Because T' contains 9i units of overweight, the number of rotations 

needed to rebalance T' is gi) = 0(X]i=i log mi). 

Each time Case 2 or Case 3 applies, the number of overweight conflicts is 
reduced. Thereby, logm^) transformations are performed. In Case 1 ei- 

ther the overweight conflict is resolved or it is shifted along the search path 
towards the root. Thus, Case 1 applies 0{L) times. Therefore, the number of 
color changes needed to rebalance T' is logm^ + L). □ 

5 Conclusions 

There are applications in which a large number of updates for a search structure 
is created in a very short time, for example by measuring equipment, or when a 
document is inserted into a document database. It is often important that such 
a group is brought into the structure as fast as possible, and that during this 
group update the concurrent use of the structure is allowed. 

Our work in the present paper is along the lines of [10], where a group 
insertion algorithm for AVL-trees was presented and analyzed. The novelty of 
the present paper is that we consider red-black binary trees, and that we obtain 
a group update algorithm that is more efficient than the one given in [10] in the 
following sense: In the algorithm of [10], all nodes in the group search path must 
be marked as unbalanced nodes and the balance in these nodes must be restored 
during the rebalancing phase of the algorithm. Our new algorithm is able to 
restrict the balance restoring to those nodes that have gotten out of balance 
because of the group update. 

One important aspect of group updates not considered in the present paper 
is recovery, i.e., the question of how a valid search structure can be efficiently 
restored after a possible failure during a group update. These questions have 
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been considered in [15] for AVL-trees, and we believe that similar methods can 

be applied to red-black trees. 
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Abstract. We show SVPoo and CVPoo to be NP-hard to approximate to 
within for some constant c > 0. We show a direct reduction 

from SAT to these problems, that combines ideas from [ABSS93] and 
from [DKRS99], along with some modifications. Our result is obtained 
without relying on the POP characterization of NP, although some of 
our techniques are derived from the proof of the POP characterization 
itself [DFK+99]. 

1 Introduction 

Background 

A lattice L = L{vi, for linearly independent vectors vi, ..,n„ e A* is the 

additive group generated by the basis vectors, i.e. the set L = UiVi \ a,i e Z}. 
Given L, the Shortest Vector Problem (SVPp) is to find the shortest non-zero 
vector in L. The length is measured in Euclidean Ip norm (1 < p < oo). The 
Closest Vector Problem (CVPp) is the non-homogeneous analog, i.e. given L and 
a vector y, find a vector in L, closest to y. 

These lattice problems have been introduced in the previous century, and 
have been studied since. Minkowsky and Dirichlet tried, with little success, 
to come up with approximation algorithms for these problems. It was much 
later that the lattice reduction algorithm was presented by Lenstra, Lenstra 
and Lovasz [IJT82] , achieving a polynomial-time algorithm approximating the 
Shortest Lattice Vector to within the exponential factor 2”/^, where n is the 
dimension of the lattice. Babai [Bab86] applied IJT’s methods to present an 
algorithm that approximates CVP to within a similar factor. Schnorr [Sch85] 
improved on IJT’s technique, reducing the factor of approximation to (1 + e)”, 
for any constant e > 0, for both CVP and SVP. These positive approximation 
results hold for Ip norm for any p > 1 yet are quite weak, achieving only ex- 
tremely large (exponential) approximation factors. The shortest vector problem 
is particularly important, quoting [ABSS93], because even the above relatively 
weak approximation algorithms have been used in a host of applications, includ- 
ing integer programming, solving low-density subset-sum problems and break- 
ing knapsack based codes [L085], simultaneous diophantine approximation and 
factoring polynomials over the rationals [IJT82], and strongly polynomial-time 
algorithms in combinatorial optimization [FT85]. 
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Interest in lattice problems has been recently renewed due to a result of Ajtai 
[Ajt96], showing a reduction, from a version of SVP, to the average-case of the 
same problem. 

Only recently [Ajt98] showed a randomized reduction from the NP-complete 
problem Subset-Sum to SVP. This has been improved [CN98], showing approx- 
imation hardness for some small factor (1 -|- Very recently [Mic98] has 

significantly strengthened Ajtai’s result, showing SVP hard to approximate to 
within some constant factor. 

The above results all apply to SVPp, for finite p. SVP with the maximum 
norm lea, appears to be a harder problem. A (/-approximation algorithm for 
SVP 2 implies a y^(/-approximation algorithm for SVPqo, since for every vector 
V, ||u||oo < < Halloo • Vn- Thus hardness for approximating SVPqo to within 

a factor y/ng will imply the hardness for approximating SVP 2 to within factor 
g. Lagarias showed SVPqq to be NP-hard in its exact decision version. Arora 
et al. [ABSS93] utilized the PCP characterization of NP to show that both 
CVP (for Ip norm for any p) and SVPqo are quasi-NP-hard to approximate 
to within ^ for any constant e > 0. Recently, the hardness result for 

approximating CVP has been strengthened [DKS98, DKRS99] showing that it 
is NP-hard to approximate to within a factor of (where n is the 

lattice dimension) . In this paper we similarly strengthen the hardness result for 
approximating SVPqo. 

So far there is still a huge gap between the positive results, showing approx- 
imations for SVP and CVP with exponential factors, and the above hardness 
results. Nevertheless, some other results provide a discouraging indication for 
improving the hardness result beyond a certain factor. [GG98] showed that ap- 
proximating both SVP 2 and CVP 2 to within ^/n and approximating SVPqo and 
CVP 00 to within n/0(logn) is in NP n co-AM. Hence it is unlikely for any of 
these problems to be NP-hard. 



Our Result 

We prove that approximating SVPqo and CVPqo to within a factor of is 

NP-hard (where n is the lattice dimension and c > 0 is some arbitrary constant). 



Technique 

We obtain our result by modifying (and slightly simplifying) the framework of 
[DKS98, DKRS99]. Starting out from SAT, we construct a new SAT instance 
that has the additional property that it is either totally satisfiable, or, not even 
weakly-satisfiable in some specific sense (to be elaborated upon below) . We refer 
to such a SAT instance as an SSA%x, instance (this is a variant of [DKS98]’s 
SSAT). The construction reducing SAT to SSA%o is the main part of the 
paper. The construction has a tree-like recursive structure that is a simplification 
of techniques from [DKS98, DKRS99], along with some additional observations 
tailored to the lea norm. 
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We finally obtain our result by reducing SSA%o to SVPoo and to CVPoo- 
These reductions are relatively simple combinatorial reductions, utilizing an ad- 
ditional idea from [ABSS93]. 

Hardness-of-approximation results are naturally divided into those that are 
obtained via reduction from PCP, and those that are not. Although the best 
previous hardness result for SVPoo [ABSS93] relies on the PCP characteriza- 
tion of NP, our proof does not. We do, however, utilize some techniques similar 
to those used in the proof of the PCP characterization of NP itself. In fact, 
the nature of the SVPoo problem eliminates some of the technical complica- 
tions from [DFK+99, DKS98, DKRS99]. Thus, we believe that SVPoo makes a 
good candidate (out of all of the lattice problems) for pushing the hardness-of- 
approximation factor to within polynomial range. 



Structure of the Paper 

Section 2 presents a variant of the SSAT problem from [DKS98] which we call 
SSAToo- It then proceeds with some standard (and not so standard) definitions. 
Section 3 gives the reduction from SAT to SSA%o, whose correctness is proven in 
Section 4. Finally, in Section 5 we describe the (simple) reduction from SSA%a 
to SVPoo and to CVPoo, establishing the hardness of approximating SVPoo and 
CVPoo. 

2 Definitions 
2.1 SSA%o 

A SAT instance is a set = {V’l, ■ ■ ■ , V'n} of tests (Boolean functions) over 
variables V = {ui, We denote by 7?.^. the set of satisfying assignments 

for 'tjji G <!'. The Cook-Levin [Coo71, Lev73] theorem states that it is NP-hard 
to distinguish whether or not the system is satisfiable (i.e. whether there is an 
assignment to the variables that satisfies all of the tests) . We next define SSA%a, 
a version of SAT that has the additional property that when the instance is not 
satisfiable, it is not even ’weakly-satisfiable’ in a sense that will be formally 
defined below. 

We recall the following definitions (Definitions 1,2 and 3) from [DKS98], 

Definition 1 (Super- Assignment to Tests). A super- assignment is a func- 
tion S mapping to each tp G ]!/ a value from . S{tp) is a veetor of integer 
coefficients, one for each value r G TZ,p. Denote by S'(-i/')[r] the coordinate of 
S{f). 

If S'(V') = 0 we say that Sptp) is trivial. If S{'tp)[r] 0, we say that the value 

r appears in S{tp). A natural assignment (an assignment in the usual sense) is 
identified with a super-assignment that assigns each %p d 'P & unit vector with 
a 1 in the corresponding coordinate. In this case, exactly one value appears in 
each S{'f). 
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We next define the projection of a super-assignment to a test onto each of 
its variables. Consistency between tests will amount to equality of projections 
on mutual variables. 

Definition 2 (Projection). Let S be a super-assignment to the tests. We de- 
fine the projection of S{%fi) on a variable x of G in the natural 

way: 

WaeT: ^ 5'(V’)H 

r\^=a 

We shall now proceed to define the notion of consistency between tests. If 
the projections of two tests on each mutual variable x are equal (in other words, 
they both give x the same super-assignment) , we say that the super-assignments 
of the tests are consistent (match). 

Definition 3 (Consistency). Let S be a super-assignment to the tests in ]L. 
S is eonsistent if for every pair of tests ifi and fij with a mutual variable x, 

Given a system = {V'l, ..., V'n}, a super- assignment S : ^ is called 

not-all-zero if there is at least one test £ 4' for which S{'tp) 0. The norm of 

a super-assignment S is defined 

Il'S'll =‘^ max(||S'(V')||i) 

where ||S'('i/')||i is the standard h norm. The norm of a natural super-assignment 
is 1. 

The gap of SSA%a is formulated in terms of the norm of the minimal super- 
assignment that maintains consistency. 

Definition 4 [g-SSAToo). An instance of SSA%o with parameter g 
1 = fp = , V = {vi, 

consists of a set of tests over a common set V of variables that take values in a 
field T . The parameters m and |JT| are always bounded by some polynomial in n. 
Each test eT has associated with it a list TZ^ of assignments to its variables, 
called the satisfying assignments or the range of the test tp. The problem is to 
distinguish between the following two cases, 

Yes: There is a consistent natural assignment for 4/ . 

No: No not-all-zero consistent super- assignment is of norm > g. 

Remark. The definition of 55.47^ differs from that of SSAT only in the 
characterization of when a super-assignment falls into the ’no’ category. On one 
hand, 55.47^x3 imposes a weaker requirement of not-all-zero rather than the 
non-triviality of SSAT . On the other hand, the norm of a super assignment S 
is measured by a ’stronger’ measure, taking the maximum of ||S'('i/')||i over all 
tp, rather than the average as in SSAT. 
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Theorem 1 (SSA%a Theorem) . 55^7^ is NP-hard for g = ” for 

some c > 0. 

We conjecture that a stronger statement is true, which would imply that 
SVPoo NP-hard to approximate to within a constant power of the dimension. 

Conjecture 2 55^7^ is NP-hard for g = n‘^ for some constant c > 0. 



2.2 LDFs, Super-LDFs 

Throughout the paper, let T denote a finite field iF =~Lp for some prime number 
p > 1. We will need the following definitions. 

Definition 5 (low degree function - [r, c?]-LDF). A function f : T'^ T is 

said to have degree r if its values are the point evaluation of a polynomial on 
with degree < r in each variable. In this case we say that f is an [r,d\-LDF, or 
f e LDIfd- 

Sometimes we omit the parameters and refer simply to an LDF. 

Definition 6 (low degree extension). Let m,d be natural numbers, and let 
H C F such that \H\ = rn. A vector {oq, .., 0 ,^- 1 ) G 7^™ can be naturally 

identified with a function A : F by looking at points in IP^ as representing 

numbers in base \H\. 

There exists exactly one [|77| — l,d\-LDF A : F'^ F that extends A. A is 
called the |77| — 1 degree extension of A in F. 

A (D+ 2)-dimensional affine subspace [[D + 2)-cube for short) C C F‘^ is said 
to be parallel to the axises if it can be written as C = x + span(cij , . . . , eijj_^_^), 
where x G F‘^ and Cj G F‘^ is the i-th axis vector, Cj = (0, .., 1, ..,0). We write 
the parameterization of the cube C as follows, 

U+2 

C[z) X + ^ ZjCi. G F'^ for G = [zi, ..,Zu+ 2 ) G 7^^+^ 
i=i 

We will need the following (simple) proposition. 

Proposition 1. Let f : F‘^ -z F . Suppose, for every parallel {D + 2)-cube 
C C F'^ the function f\c ■ F^^^ -z F defined by 

Vx G F^+^ f\c(x) = /(C(x)) 

is an [r, D + 2] -LDF. Then f is an [r, d] -LDF. 



Similar to the definition of super-assignments, we define a super-[r,d\-LDF 
(or a super-LDF for short) Q G to be a vector of integer coefficient Q[P] 

per LDF F G LDF^^d. This definition arises naturally from the fact that the 
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tests in our final construction will range over LDFs. We further define the norm 
of a super-LDF to be the l\ norm of the corresponding coefficient vector. 

We say that an LDF F G LDF^^d appears in Q iff Q[F] 7^: 0. A point x is called 
ambiguous for a super-LDF Q, if there are two LDFs Pi, P2 appearing in Q such 
that F\{x) = P2 (x). The following (simple) property of low-norm super-LDFs is 
heavily used in this paper. 

Proposition 2 (Low Ambiguity). Let Q be a super-[r, d\-LDF of norm < g. 
The fraetion of ambiguous points for Q is < amb(r, d,g) '^= (f) |^- 

Proof. The number of non-zero coordinates in an integer vector whose h norm 
is g is < g. There are < (2) pairs of LDFs appearing in Q, and each pair agrees 
on at most of the points in 



The following embedding-extension technique taken from [DFK+99] is used in 
our construction, 



Definition 7 (embedding extension). Let b > 2, k > 1 and t be natural num- 
bers. We define the embedding extension mapping Eb : ^ P* * as follows. 

Eb maps any point x = (,fi, e P* to y e P* y = Eb{x) = {rji, ..,rjt.k) by 



Pfc(6, 6) (a, (6 )^ {fif , (6)^ (6)‘ 




The following (simple) proposition, shows that any LDF on P* can be repre- 
sented by an LDF on P* * with significantly lower degree: 

Proposition 3. Let f : J-* ^ J- be a [b^ — l,t]-LDF, for integers t > 0,6 > 
1, A; > 1. There is a [b — l,t ■ k]-LDF f^^i : P* * ^ P sueh that 



Vx e P* : /(x) = fext{Fb{x)) 



For any [6— l,A;t]-LDF /, its ’restriction’ to the manifold /juj : P* ^ P is 
defined as 

VxeP‘ f\E,{x)=^ f{Eb{x)) 

and is a [6* — l,t]-LDF (the degree in a variable fi of f\E,, is (6 — 1)(6° + 6^ + 

. . . + 6*-i) = 6* - 1). 



Let Q he a, super-[6* — l,t]-LDF (i.e. a vector in Its embedding- 

extension is the super-[6— l,tA;]-LDF Q defined by, 

V/ G LDFb-i,tk Q[f] "= Q[f\E,] 

In a similar manner, the restrietion Q of a super- [6— 1, tA;]-LDF Q is a super- 
[6* — l,t]-LDF defined by 
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The following proposition holds (e.g. by a counting argument), 

Proposition 4. Let Qi,Q 2 be two super-[b— l,tk]-LDFs, and let Qi,G 2 be their 
respeetive restrictions (with parameter h). Q\ = Q 2 if and only if Q\ = f/ 2 - 

3 The Construction 

We prove that SSA%a is NP-hard via a reduction from SAT, described herein. 
We adopt the whole framework of the construction from [DKRS99], and refer 
the reader there for a more detailed exposition. 

Let <P = {(/3i, ..,(pn} be an instance of SAT, viewed as a set of Boolean tests 
over Boolean variables = {xi, ..,Xm}, (m = n° for some constant c > 0) such 
that each test depends on D = 0(1) variables. Cook’s theorem [Coo71] states 
that it is NP-hard to decide whether there is an assignment for satisfying all 
of the tests in L>. 

Starting from <P, we shall construct an SSA7(c test-system ih over variables 
V)/ D V^. Our new variables will be non-Boolean, ranging over a field JT, 
with |JT| = log log n some constant c > 0. An assignment to fV will be 
interpreted as an assignment to by identifying the value 0 G JT with the 
Boolean value true and any other non-zero value with false. 



3.1 Constructing the CR- Forest 

In order to construct the SSAT instance / = (if', P, {7?.^^, ..,7?.^^}) we need 
to describe for each test CLL/, which variables it depends on, and its satisfying 
assignments TZ^. We begin by constructing the CR- forest, which is a combina- 
torial object holding the underlying structure of W. The forest F„(^>) will have 
a tree for every test cp E <L. Each node in the forest will have a set of vari- 
ables associated with it. For every leaf there will be one test depending on the 
variables associated with that leaf. 

Let us (briefly) describe one tree T,^ in the forest F„(^>). 

Every tree will be of depth K < log log n (however, not all of the leaves will 
be at the bottom level). 

Each node v in the tree will have a domain dom^, = T'^ of points (dom„ = 
jrdo 

in case v is the root node) associated with it. 

The offsprings of a non-leaf node v will be labeled each by a distinct (D + 2)- 
cube of dom„ (this part is slightly simpler than in [DKRS99]), 



labels(u) '^= {C | C is a (17 + 2)-cube in dom^, } . 

The points in the domain dom„ of each node v will be mapped to some of 
^^’s variables, by the injection var„ : dom„ ^ V^. This mapping essentially 
describes the relation of a node to its parent, and is defined inductively as follows. 
For each node v, we denote by R, the set of ’fresh new’ variables mapped from 
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donii, (i.e. none of the nodes defined inductively so far have points mapped to 
these variables). Altogether 

U f/ . 

V i_p 

For the root node, 'vavroot ■ doniroot ^ fV is defined (exactly as in 
[DKRS99]) by mapping C doxuroot^f = to and the rest of the 

d&f '' 

points to the rest of Vroot^ = c (i.e. the low-degree-extension of V^). It 
is important that var root^ is defined independently of cp. 

For a non-root node v with parent u, the points of the cube G labels(M) 
labeling v are mapped into the domain dom^, by the embedding extension 
mapping, Ei,^ : dom^,, defined above in Section 2.2 (the parameter 

specified below depends on the specific node v, rather than just on v’s level 
as in [DKRS99]). These points are m’s points that are ’passed on’ to the off- 
spring V. We think of the point y = Eh^ (x) G dom^, as ’representing’ the point 
X E Cv C dom„, and define var„ : dom^, ^ as follows. 

Definition 8 (var^,, for a non- root node v). Let v be a non-root node, let u 
be v’s parent, and let C dom„ be the label attached to v. For each point y G 
Eb„{Cv) C dom„ define war „{y) = var„(A’j^^ (j/)), i.e. points that ’originated’ 
from Cv are mapped to the previous-level variables, that their pre-images in 
were mapped to. For each ’new’ point y G dom„ \ Eb„{Cv) we define var„(y) to 
be a distinct variable from V^. 

The parameters used for the embedding extension mappings Eb^ are t = 
D + 2, k = d/t = a. We set the degree of the root node rroot,^ = \T~t\ = 
and and fet, (for non-root nodes v) are defined by the following recursive 
formulas: 

{ f/ru + 1 Cv is parallel to the axises 



^ru{E + 2) + 1 Otherwise 

- 1 

We stop the recursion and define a node to be a leaf (i.e. define its labels 
to be empty) whenever < 2{D -|- 2). A simple calculation (to appear in the 
complete version) shows that decrease with the level of v until for some 

level K < log log n, r^ < 2{D 2) = 0(1). (This may happen to some nodes 

sooner than others, therefore not all of the leaves are in level K). 

We now complete the construction by describing the tests and their satisfying 
assignments. 

Definition 9 (Tests). E will have one test tpv for each leaf v in the forest, tpv 
will depend on the variables in var„(dom„). The set of satisfying assignments 
for ’s variables, , will consist of assignments A that satisfy the following 
two conditions: 
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1. A is an [ry,d]-LDF on vari,(domi,) 

2. Ifv£ for (f e <P and (p’s variables appear in var„(dom„), then A must 
satisfy p. 

4 Correctness of the Construction 

4.1 Completeness 

Lemma 1 (completeness). If there is an assignment A : V<p ^ {true, false} 
satisfying all of the tests in F, then there is a natural assignment -.V^ ^ 
satisfying all of the tests in \F. 

We extend A in the obvious manner, i.e. by taking its low-degree-extension (see 
Definition 6) to the variables V^, and then repeatedly taking the embedding 
extension of the previous-level variables, until we’ve assigned all of the variables 
in the system. More formally, 

Proof. We construct an assignment A^ : IV ^ JT by inductively obtaining [ri,d]- 
LDFs Pv : dom„ ^ T for every level-i node v of every tree in the CR-forest, as 
follows. We first set (for every p et P) Proot.^ to be the low degree extension (see 
Definition 6) of M (we think of A as assigning each variable a value in {0,1} C P 
rather than {true, false}, see discussion in the beginning of Section 3). Assume 
we’ve defined an [r„, cf|-LDF P„ consistently for all level-i nodes, and let v be an 
offspring of u, labeled by The restriction / = Pt|c„ of P„ to the cube is an 
[r, D + 2]-LDF where r = r„ or r = r„(D + 2) depending on whether C is parallel 
to the axises or not. / can be written as a \^r +1 — 1, a ■ {D + 2)]-LDF f^xt 
over the larger domain V'’*, as promised by Proposition 3. We define P^ = /ext 
to be that [r„, cf|-LDF (recall that d = a ■ [D 1-2) and P = f/r + 1). 

d&f 

Finally, for a variable x G var^,, x = var^,(x), we set Ai^ipP) = Pv{x). The 
construction implies that there are no collisions, i.e. x' = var„/(x') = var^,(x) = 
X implies Pv{x) = P„/(x')- □ 



4.2 Soundness 

We need to show that a ’no’ instance of SAT is mapped to a ’no’ instance of 
SSA%o ■ We assume that the constructed SSA%o instance has a consistent non- 
trivial super-assignment of norm < g, and show that <P - the SAT instance we 
started with - is satisfiable. 

def 

Lemma 2 (Soundness). Let g = |V| . If there exists a consistent super- 

assignment of norm < g for W , then <P is satisfiable. 

Let M be a consistent non-trivial super-assignment for P, of size |IM||oo < 9- 
It induces (by projection) a super-assignment to the variables 

m:Vq, — ^ 
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i.e. for every variable x G m assigns a vector 7Tx(Gl('i/')) of integer coefficients, 
one per value in T where 'ip is some test depending on x. Since A is consistent, 
rn is well defined (independent of the choice of test %p). Alternatively, we view 
m as a labeling of the points dom„ by a ’super-value’ - a formal 

linear combination of values from T . The label of the point x G donii, for 
some V G Fn{‘P), is simply m(var„(x)), and with a slight abuse of notation, is 
sometimes denoted rn{x). rn is used as the “underlying point super-assignment” 
for the rest of the proof, and will serve as an anchor by which we test consistency. 

The central task of our proof is to show that if a tree has a non-trivial leaf, 
then there is a non-trivial super-LDF for the domain in the root node that is 
consistent with m. We will later want to construct from these super-LDFs an 
assignment that satisfies all of the tests in <P. For this purpose, we need the 
super-LDFs along the way to be legal, 

Definition 10 (legal). An LDF F is called legal for a node v G (for 
some £ <P), if it satisfies ip in the sense that if ip’s variables have pre-images 
xi,..,xjj G dom„, then F{xi) , F{xjj) satisfy p>. A super-LDF Q is calledlegal 
for V G if for every LDF P appearing in Q, P is legal for v G T^. 

The following lemma encapsulates the key inductive step in our soundness 
proof, 

Lemma 3. Let u G nodes^ for some 0 < i < K . There is a legal swper-[r„, cf]- 

def 

LDF Qu with |[t/„||i < ||m||oo = maxj, ||m(x)||i such that for every x G dom„, 
'Xx {Qu) = m{x) . Furthermore, if there is a node v in u’s sub-tree for which Qv 0 
then Qu ^ 0- 

Due to space limitations, the proof of this lemma is omitted, and appears in the 
full version of this paper. 

In order to complete the soundness proof, we need to find a satisfying as- 
signment for <P. We obtained, in Lemma 3, a super-[ro, d]-LDF for each root 
node root^, such that Vx G domroot^ = , m{x) = tvx{Q^). Note that indeed, 

for every pair of tests ^ ip' , the corresponding super-LDFs must be equal 
Q^f ~ Qif' (denote them hy Q). This follows because they are point-wise equal 
tTx{Q,^) = rn{x) = 'Kx{Qip')i and so the difference super-LDF Q^p —global is trivial 
on every point, and must therefore (again, by Proposition 2 - low- ambiguity) be 
trivial. 

If A is not trivial, then there is at least one test tpv for which A(V') ^ 0 . 
Thus, denoting by ip the test for which w is a leaf in T^, Lemma 3 implies 
Q = Qi^ 0- Take an LDF / that appears in Q, and define for every w G F#, 

A{v) '^= f{x) where x G is the point mapped to v. Since Q is legal, <P is 
totally satisfied by A. 

5 From SSAfXx> to SVPqo 

In this section we show the reduction from g-SSATx> to the problem of ap- 
proximating SVPoo- This reduction follows the same lines of the reduction in 
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[ABSS93] from Pseudo-Label- Cover to SVPoo- We begin by formally defining 
the gap-version of SVPoo (presented in Section 1) which is the standard method 
to turn an approximation problem into a decision problem. 

Definition 11 (g-SVPoo)- Given a lattice C and a number d > 0, distinguish 
between the following two eases: 

Yes. There is a non- zero lattice vector v e JC with HuHoo < d. 

No. Every non- zero lattiee veetor u e £ has ||u||oo > fj ■ d. 

We will show a reduction from g-SSA%o to y^-SVPoo, thereby implying 
SVPoo to be NP-hard to approximate to within a factor of ^/g = n^C)/iogiogn^ 
Let / = {'E, V, {TZ^}) be an instance of g-SSA%o, where E = {V'l, ■ ■ ■ , V’n} 
is a set of tests over variables V = {t>i, ..,Vm}, and is the set of satisfying 
assignments for ipi <E E. We construct a y^-SVPoo instance {£{B),d) where 
d& f 

d = 1 and B is an integer matrix whose columns form the basis for the lattice 
C{B). 

The matrix B will have a column for every pair of test tl: E E and an 

assignment r G 7?.^ for it. There will be one additional special column t. The 
matrix B will have two kinds of rows, consistency rows and norm-measuring 
rows, defined as follows. 

Consisteney Rows. B will have \E\ + 1 rows for each threesome x) where 

and 'tpj are tests that depend on a mutual variable x. Only the columns of ^pi 
and ipj will have non-zero values in these rows. 

The special column t will have ^Jg in each consistency row, and zero in the 
other rows. 

For a pair of tests V'* and 'tpj that depend on a mutual variable x, let’s 
concentrate on the sub-matrix consisting of the columns of these tests, and the 
\tF\ -b 1 rows of the pair 'pi, 'tpj viewed as a pair of matrices Gi of dimension 
(|JF| + 1) X \TZj,p and G2 of dimension (|JT| + 1) X Let r e TZj,- be a 

satisfying assignment for pi and r' G TZj,^ be a satisfying assignment for pj. 
The r-th column in G\ equals ^/g times the unit vector where i = r\x (i.e. 
a vector with zeros everywhere and a ^Jg in the r|j,-th coordinate). The r'-th 
column in G 2 equals ^Jg ■ (1 — ep where i = r'\x and 1 is the all-one vector (i.e. 
everywhere except a zero in the r'|j;-th coordinate). 

Notice that any zero-sum linear combination of the vectors {e^, 1 — e^, 1}^ 
must give Ci the same coefficient as 1 — ei, because the vectors { 1 , are linearly 
independent. 



Norm-measuring Rows. There will be a set of TZj, rows designated to each test 
p E E m which only p^s columns have non-zero values. The matrix B, when 
restricted to these rows and to the columns of 'tp, will be the {\TZj,\ x \TZj,\) 
Hadamard matrix H (we assume for simplicity that \TZj,\ is a power of 2, thus 
such a matrix exists, see [B0I86] p. 74). Recall that the Hadamard matrix H„ of 



order 2" x 2" is defined by Hq = (1) and H„ 



: II 

VH„-i 
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The vector t, as mentioned earlier, will be zero on these rows. 

Proposition 5 (Completeness). If there is a natural assignment to W , then 
there is a non-zero lattice vector v G C.{B) with ||w||oo = 1- 

Proof. Let ^ be a consistent natural assignment for W . We claim that 

v = t- 

is a lattice vector with ||n||oo = 1- Restricting to an arbitrary 

row in the consistency rows (corresponding to a pair of tests 'tpi, 'tpj with mutual 
variable x), gives y/g, because A{'fi)\x = A{'fj)\x- Subtracting this from t gives 
zero in each consistency-row. 

In the norm-measuring rows, since every test 'tp E 4^ is assigned one value by 
A, V restricted to 'tp’s rows equals some column of the Hadamard matrix which 
is a ±1 matrix. Altogether, ||v|loo = 1 as claimed. □ 



Proposition 6 (Soundness). If there is a non-zero lattice vector v E C{B) 
with Halloo < ^/9, then there is a consistent non-trivial super- assignment A for 
4/, for which ||A||oo < Q- 

Proof. Let 

V Cf • t T ^ ^ ‘ 

-f/'jr 

be a lattice vector with ||v|loo < ^/O- The entries in the consistency rows of every 
lattice vector, are integer multiples of The assumption ||u||oo < ^/g implies 
that V is zero on these rows. 

Define a super- assignment A to by setting for each tp E 4r and r E Up,, 

M / ^ r 1 de / 
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To see that A is consistent, let 'tpi/tpj G ^ both depend on the variable x. 
Notice that (as mentioned above) any zero-sum linear combination of the vectors 
{ 1 , e^;, 1 — ek}j. must give Ck and 1 — the same coefficient because the vectors 
{ 1 , are linearly independent. This implies that for any value k ^ T for x, 

A'Pi.r] — 

r\x=k r'\x=k 

This, in turn, means that tTx{A{'(A)) = thus A is consistent. 

A is also not-all-zero because u 0 (if only Ct was non-zero, then ||v||oo = 
^/g)■ The norm of A is defined as 

Halloo = max(||^(V>)||i) 

'ip&l' 

The vector v restricted to the norm- measuring rows of "i/t is exactly H A{tl)). Now 

since , ^ -H is a 
Vl^ 

||^^H^(V^)||2 = M(V')||2 

Since for every z G M”, ||2;||oo > J|£]|2/v^) ’"'g obtain ||H^('i/’)||oo > um\2- 
Now for every integer vector z, a/||^||i < \\z\\ 2 , and altogether, 

VM(V')lli < ||A(V')||2 < ||H^(V')||oo < llt-lloo < V9 

def 

showing Halloo = maxy,g,// (|[^(V’)||i) < g as claimed. □ 

Finally, if is a SSA%x, no instance, then the norm of any consistent super- 
assignment A must be at least g, and so the norm of the shortest lattice vector 
in C{B), must be at least g. This completes the proof of the reduction. 

The reduction to CVPoo is quite similar, taking t to be the target vector, and is 
omitted. 



(|7?.y,| X \R^\) orthonormal matrix, we have 



References 



[ABSS93] 

[Ajt96] 

[Ajt98] 



[Bab86] 



S. Arora, L. Babai, J. Stern, and Z. Sweedyk. The hardness of approximate 
optima in lattices, codes and linear equations. In Proc. 34th IEEE Symp. 
on Eoundations of Computer Science^ pages 724-733, 1993. 

M. Ajtai. Generating hard instances of lattice problems. In Proc. 28th 
ACM Symp. on Theory of Computing, pages 99-108, 1996. 

Miklos Ajtai. The shortest vector problem in L2 is NP-hard for randomized 
reductions. In Proceedings of the 30th Annual ACM Symposium on Theory 
of Computing (STOC-98), pages 10-19, New York, May 23-26 1998. ACM 
Press. 

L. Babai. On Lovasz’s lattice reduction and the nearest lattice point prob- 
lem. Combinatorica, 6:1-14, 1986. 




276 



Irit Dinur 



[B0I86] 

[CN98] 

[Coo71] 

[DFK+99] 

[DKRS99] 

[DKS98] 

[FT85] 

[GG98] 

[Lev73] 

[LLL82] 

[L085] 

[Mic98] 

[Sch85] 



B. Bollobas. Combinatorics. Gambridge University Press, 1986. 

J.Y. Cai and A. Nerurkar. Approximating the SVP to within a factor 
(1 + 1/dim®) is NP-hard under randomized reductions. In Proc. of the 
13th Annual IEEE Conference on Computational Complexity, pages 46- 
55. 1998. 

S. Cook. The complexity of theorem-proving procedures. In Proc. 3rd 
ACM Symp. on Theory of Computing, pages 151-158, 1971. 

Dinur, Fischer, Kindler, Raz, and Safra. PGP characterizations of NP: To- 
wards a polynomially-small error-probability. In STOC: ACM Symposium 
on Theory of Computing (STOC), 1999. 

I. Dinur, G. Kindler, R. Raz, and S. Safra. Approximating- CVP to within 
almost-polynomial factors is NP-hard. Manuscript, 1999. 

Dinur, Kindler, and Safra. Approximating-CVP to within almost- 
polynomial factors is NP-hard. In FOCS: IEEE Symposium on Foun- 
dations of Computer Science (FOCS), 1998. 

Andras Frank and Eva Tardos. An application of simultaneous approx- 
imation in combinatorial optimization. In 26th Annual Symposium on 
Foundations of Computer Science, pages 459-463, Portland, Oregon, 21- 
23 October 1985. IEEE. 

O. Goldreich and S. Goldwasser. On the limits of non-approximability 
of lattice problems. In Proc. 30th ACM Symp. on Theory of Computing, 
pages 1-9, 1998. 

L. Levin. Universal’nyie perebornyie zadachi (universal search problems : 
in Russian). Problemy Peredachi Informatsii, 9(3):265-266, 1973. 

A.K. Lenstra, H.W. Lenstra, and L. Lovasz. Factoring polynomials with 
rational coefficients. Math. Ann., 261:513-534, 1982. 

J. C. Lagarias and A. M. Odlyzko. Solving low-density subset sum prob- 
lems. Journal of the ACM, 32(l):229-246, January 1985. 

D. Micciancio. The shortest vector in a lattice is hard to approximate 
to within some constant. In Proc. 39th IEEE Symp. on Foundations of 
Computer Science, 1998. 

C. P. Schnorr. A hierarchy of polynomial-time basis reduction algorithms. 
In Proceedings of Conference on Algorithms, Pecs (Hungary), pages 375- 
386. North- Holland, 1985. 




Convergence Analysis of 
Simulated Annealing-Based Algorithms Solving 
Flow Shop Scheduling Problems* 



Kathleen Steinhofef , Andreas Albrecht^, and Chak-Kuen Wong^ 



^ GMD - National Research Center for Information Technology, 
Kekulestr. 7, D-12489 Berlin, Germany 
^ Dept, of Computer Science and Engineering, 

The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 



Abstract. In the paper, we apply logarithmic cooling schedules of inho- 
mogeneous Markov chains to the flow shop scheduling problem with the 
objective to minimize the makespan. In our detailed convergence analy- 
sis, we prove a lower bound of the number of steps which are sufficient 
to approach an optimum solution with a certain probability. The result 
is related to the maximum escape depth F from local minima of the 
underlying energy landscape. The number of steps k which are required 
to approach with probability 1 — d the minimum value of the makespan 
is lower bounded by ■ log'^^'^^ (1/d). The auxiliary computations 

are of polynomial complexity. Since the model cannot be approximated 
arbitrarily closely in the general case (unless P = NP), the approach 
might be used to obtain approximation algorithms that work well for 
the average case. 



1 Introduction 

In the flow shop scheduling problem n jobs have to be processed on m different 
machines. Each job consists of a sequence of tasks that have to be processed 
during an uninterrupted time period of a fixed length. The order in which each 
job is processed by the machines is the same for all jobs. A schedule is an 
allocation of the tasks to time intervals on the machines and the aim is to 
And a schedule that minimizes the overall completion time which is called the 
makespan. 

Flow shop scheduling has long been identified as having a number of impor- 
tant practical applications. Baumgartel addresses in [4] the flow shop problem in 
order to deal with the planning of material flow in car plants. His approach was 
applied to the logistics for the Mercedes Benz automobile. The NP-hardness of 
the general problem setting with m > 3 was shown by Garey, Johnson, and Sethi 
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RGC Earmarked Grant, Ref. No. CUHK 4367/99E, and by the HK-Germany Joint 
Research Scheme under Grant No. D/9800710. 
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[10] in 1976. The existence of a polynomial approximation scheme for the flow 
shop scheduling problem with an arbitrary fixed number of machines is demon- 
strated by Hall in [12]. A recent work of Williamson et al. constitutes theoretical 
evidence that the general problem, which is considered in the present paper, is 
hard to solve even approximately. They proved that finding a schedule that is 
shorter than 5/4 times the optimum is NP-hard [23]. 

We are concentrating on the convergence analysis of simulated annealing- 
based algorithms which employ a logarithmic cooling schedule. The algorithm 
employs a simple neighborhood which is reversible and ensures a priori that tran- 
sitions always result in a feasible solution. The neighborhood relation determines 
a landscape of the objective function over the configuration space T of feasible 
solutions of a given flow shop scheduling problem. Let as{k) denote the probabil- 
ity to obtain the schedule S e T after k steps of a logarithmic cooling schedule. 
The problem is to find an upper bound for k such that ^s{k) > 1 — e 

for schedules S minimizing the makespan. The general framework of logarithmic 
cooling schedules has been studied intensely, e.g., by B. Hajek [11] and O. Catoni 
[5,6]. 

Our convergence result, i.e., the lower bound of the number of steps k, is 
based on a very detailed analysis of transition probabilities between neighbor- 
ing elements of the configuration space T . We obtain a run-time of ■ 

log*^*-^^ (1/h) to have with probability 1 — 6 a schedule with the minimum value 
of the makespan, where T is a parameter of the energy landscape characterizing 
the escape depth from local minima. 



2 The Flow Shop Problem 

The flow shop scheduling problem can be formalized as follows. There are a set 
of I jobs and a set At of m machines. Each job has exactly one task to be 
processed on each machine. Therefore, we have n := l-m tasks each with a given 
processing time p{t) G IT. There is a binary relation R on the set of tasks T that 
decomposes T into chains corresponding to the jobs. The binary relation, which 
represents precedences between the tasks is defined as follows: For every t E T 
there exists at most one t' such that (t, T) G R. If G R, then J{t) = J{t') 
and there is no a; ^ {t, t'} such that {t, x) E R or {x, t') E R. For any (u, w) E R,v 
has to be performed before w. R induces a total ordering of the tasks belonging 
to the same job. There exist no precedences between tasks of different jobs. 
Clearly, if (v,w) E R then M{v) ^ M{w). The order in which a job passes all 
machines is the same for all jobs. As the task number of a task t we will denote 
the number of tasks preceding t within its job. We can therefore assume that all 
tasks with task number i are processed on machine Mi. A schedule is a function 
S : T ^ IT U {0} that for each task t defines a starting time S{t). 

The length, respectively the makespan of a schedule S is defined by 

\{S) := max (S{v) + p{v)), 

vE'T 



( 1 ) 
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i.e., the earliest time at which all tasks are completed. The problem is to find an 
optimum schedule, that is feasible and of minimum length. A flow shop schedul- 
ing problem can be represented by a disjunctive graph, a model introduced by 
Roy and Sussmann in [17]. The disjunctive graph is a graph G = {V,A,E,p), 
which is defined as follows: 

V = T U {1,0}, 

A = [[v , te] \ v,w ^T, (v, w) e hi} U 

{[I ,w] \ w & T , & T : {v,w) e R}u 

{[u, O] I u G T, e T : (v,w) e R}, 

E={{v, w} \ v,weT,v ^ w, M{v) = M (w ) } , 

/i : V^TN. 

The vertices in V represent the tasks. In addition, there are a source (/) and 
a sink (O) which are two dummy vertices. All vertices in V are weighted. The 
weight of a vertex p{v) is given by the processing time p{v), p{v) := p{v), 
{in{I) = in{0) = 0). The arcs in A represent the given precedences between the 
tasks. The edges in E represent the machine capacity constraints, i.e., {u, w] G E 
with v,w G T and M{v) = M (w) denotes the disjunctive constraint and the two 
ways to settle the disjunction correspond to the two possible orientations of 
{u, te}. The source / has arcs emanating to all the first tasks of the jobs and the 
sink O has arcs coming from all final tasks of jobs. 

An orientation on iii is a function H : E ^ T x T such that re}) G 

{{v,w), {w,v)} for each {u, te} G E. A schedule is feasible if the corresponding 
orientation on E (fi{E) = {J?(e) | e G E}) results in a directed graph (called 
digraph) D ■= G' = {V,A,E,p,n{E)) which is acyclic. 

A path P from Xi to Xj, i,j G f^,i < j : Xi,Xj G R of the digraph D 
is a sequence of vertices {xi,Xi^\, ...,Xj) G V such that for all i < k < j, 
[xk,Xk+i] G A or {xk,Xk+i) e Q{E). 

The length of a path P{xi,Xj) is defined by the sum of the weights of all 
vertices in P: \{P{xi,Xj)) = Yl^k=i h'i^k)- The makespan of a feasible schedule 
is determined by the length of a longest path (i.e., a critical path) in the digraph 
D. The problem of minimizing the makespan therefore can be reduced to finding 
an orientation Q on E that minimizes the length of \{Pm&x)- 

3 Basic Definitions 

Simulated annealing algorithms are acting within a configuration space in accor- 
dance with a certain neighborhood structure or a set of transition rules, where 
the particular steps are controlled by the value of an objective function. The 
configuration space, i.e., the set of feasible solutions of a given problem instance 
is denoted by J- . For all instances, the number of tasks of each job equals the 
number of machines and each job has precisely one operation on each machine. 
Therefore, the size of J- can be upper bounded in the following way: In the dis- 
junctive graph G there are at most l\ possible orientations to process I tasks on 
a single machine. Hence, we have |P|< (d) . 
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To describe the neighborhood of a solution S E we define a neighborhood 
function rj : T ^ p{d^ )- The neighborhood of S is given by rj{S) C JF, and each 
solution in rj{S) is called a neighbor of S. Van Laarhoven et al. [22] propose 
a neighborhood function rf^ for solving job shop scheduling problems which 
is based on interchanging two adjacent tasks of a block. A block is a maximal 
sequence of adjacent tasks that are processed on the same machine and do belong 
to a longest path. We will use their neighborhood function with the extension 
that we allow changing the orientation of an arbitrary arc which connects two 
tasks on the same machine: 

(i) Choosing two vertices v and w such that 
M{v) = M{w) = k with e = {v,w) 6 

(ii) Reversing the order of e such that the resulting arc e' G Q'{E) is (tc, u); 

(iii) If there exists an arc (u, v) such that v u, M (u) = k, then replace the arc 
{u,v) by {u,w)-, 

(iv) If there exists an arc {w,x) such that w ^ x,M{x) = k, then replace the 
arc {w, x) by (u, x). 

Thus, the neighborhood structure is characterized by 

Definition 1 The schedule S' is a neighbor of S, S' € rj{S), if S' can be obtained 
by the transition rules 1 — 4 or S' = S. 

Our choice is motivated by two facts: 

o In contrast to the job shop scheduling the transition rules do guarantee for 
the flow shop a priori that the resulting schedule is feasible, i.e., that the 
corresponding digraph is acyclic. 

O The extension of allowing to reverse the orientation of an arbitrary arc leads 
to an important property of the neighborhood function, namely reversibility. 

Thus, the neighborhood structure is such that the algorithm visits only digraphs 
corresponding to feasible solutions and is equipped with a symmetry property 
which is required by our convergence analysis. 

Lemma 1 Suppose that e = {v, w) G fi{E) is an arbitrary arc of an acyclic 
digraph D. Let D' be the digraph obtained from D by reversing the arc e. Then 
D' is also acyclic. 

Proof: Suppose D' is cyclic. Because D is acyclic, the arc {w, v) is part of 
the cycle in D' . Consequently, there is a path P = {v,x\,X 2 , ■■■,Xi,w) in D' . 
Since w is processed before v on machine Mk at least two arcs of the path P 
are connecting vertices of the same job. From the definition of the flow shop 
problem that implies at least two vertices have a task number greater than k. 
Neither within a job nor within a machine there is an arc {y, z) such that the 
task number of y is greater than the task number of z. This contradicts that the 
path P exists in D' . Hence, D' is acyclic. q.e.d. 
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As already mentioned in Section 2, the objective is to minimize the makespan 
of feasible schedules. Hence, we define Z{S) := X{Pxnax), where Pmax is a longest 
path in D{S). Furthermore, we set 

(2) :={S I S' e and VS' (S' Z{S') > Z{S)) }. 



For the special case of tjl, Van Laarhoven et al. have proved the following 

Theorem 1 [22] For each schedule S ^ J-min, there exists a finite sequence of 
transitions leading from S to an element o/ V^min- 



The probability of generating a solution S' from S can be expressed by 



( 3 ) 



G[S, S'] := 



if S' e rj{S) 
0, otherwise. 



with \rf\< n — m + 1 which follows from Definition 1 . 

Since the neighborhood function rf^ from [22] is a special case of our transition 
rules 1 — 4, we have: 

Lemma 2 Given S G iF\tFmin, there exists S' G rj{S) such that G[S, S'] > 0. 
The acceptance probability H[S, S'], S' G rj{S) C JF, is given by: 



( 4 ) 



H[S,S'] 




Z(S')-Z(S) 



if Z{S') - Z{S) < 0, 
, otherwise. 



where c is a control parameter having the interpretation of a temperature in an- 
nealing procedures. Finally, the probability of performing the transition between 
S and S', S, S' G JF, is defined by 

r G[S, S'] •H[S, S'], ifS'yfS, 

(5) Pr{S ^S}— G'[5'^ Q] . A[S, Q], otherwise. 

I 

Let as{k) denote the probability of being in the configuration S after k steps 
performed for the same value of c. The probability as{k) can be calculated in 
accordance with 



(6) as(/c) := ^ ag(A: - 1) • Pr{Q ^ S}. 

Q 

The recursive application of (6) defines a Markov chain of probabilities as{k). 
If the parameter c = c{k) is a constant c, the chain is said to be a homoge- 
neous Markov chain; otherwise, if c{k) is lowered at any step, the sequence of 
probability vectors a{k) is an inhomogeneous Markov chain. 

We consider a cooling schedule which defines a special type of inhomogeneous 
Markov chains. For this cooling schedule, the value c{k) changes in accordance 
with 

r 

ln{k + 2) 



( 7 ) 



c{k) = 



, k = 0,1, ... . 
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The choice of c{k) is motivated by Hajek’s Theorem [11] on logarithmic cooling 
schedules for inhomogeneous Markov chains. If there exists So, Si, , Sr G 
J-( So = S A Sr = S') such that G[Su, 5'„+i] > 0, w = 0, 1, ... , (r — 1) and 
Z{Su) < h, for all w = 0, 1, ... , r, we denote height{S ^ S') < h. The schedule 
S' is a local minimum, if S G and Z(S') > Z{S) for all S' G til{S) \ S. 

By depth{Saiin) we denote the smallest h such that there exists a. S' £ if, where 
Z{S') < Z{Saiin), which is reachable at height .E(Smin) + h. 

The following convergence property has been proved by B. Hajek: 

Theorem 2 [11] Given a configuration space C and a cooling schedule defined 
by 



c{k) = 



r 

ln{k + 2)’ 



A: = 0, 1, ... , 



the asymptotic convergence Y^Hec^nik) — ► 1 of the stochastic algorithm, which 
is based on (2), (4), and (5), is guaranteed if and only if 

(i) ViJ, H' eC3Ho, Hi, ..., HreC{Ho = H AHr = H')-. G[Hu, H^+i] > 0, 

1 = 0, I,..., (r-1); 

(ii) Vh : height{H => H') < h height{H' => H) < h; 

(iii) r > max depth{H^in). 



The condition (i) expresses the connectivity of the configuration space. As al- 
ready mentioned above, with the choice of our neighborhood relation we can 
guarantee the mutual reachability of schedules. Therefore, Hajek’s Theorem can 
be applied to our configuration space T with the neighborhood relation r]. 

Before we perform the convergence analysis of the logarithmic cooling sched- 
ule defined in (7), we point out some properties of the configuration space and 
the neighborhood function. Let S and S' be feasible schedules and S' G r]{S). 
To obtain S' from S, we chose the arc e = (u, w) with M(y) = M{w). 

If Z{S) < Z{S'), then only a path containing one of the selected vertices 
V, w can determine the new makespan after the transition move. It can be shown 
that all paths whose length increase contain the edge e' = fw, v). Therefore, we 
have the following upper bound. 

Lemma 3 The increase of the objective function aZ in a single step according 
to rj (S — >ri S') can be upper by (p{v) +p{w)). 

The reversibility of the neighborhood function implies for the maximum dis- 
tance of neighbors S' G rj{S) to tFrnin in relation to S itself: If the minimum 
number of transitions to reach from S an optimum element is N, then for any 
S' G rj{S) the minimum number of transitions is at most N + 1. 



4 Convergence Analysis 

Our convergence results will be derived from a careful analysis of the “exchange 
of probabilities” among feasible solutions which belong to adjacent distance levels 
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to optimum schedules, i.e., in addition to the value of the objective function, the 
elements of the configuration space are further distinguished by the minimal 
number of transitions required to reach an optimum schedule. We first introduce 
a recurrent formula for the expansion of probabilities and then we prove the main 
result on the convergence rate which relates properties of the configuration space 
to the speed of convergence. Throughout the section we employ the fact that 
for a proper choice of T the logarithmic cooling schedule leads to an optimum 
solution. 

To express the relation between S and S' aecording to their value of the 
objective function we will use < 2 , > 2 , and =2 to simplify the expressions: 



S <zS' instead of S' € r]{S) & {Z{S) < Z{S')), 

S>zS' instead of S' G r]ls)k{Z{S) > Z{S')), 

S=zS' instead of S S' k S' e r]{S) & {Z{S) = Z{S')). 

Furthermore, we denote: 

p(5) :=|{5 <2 ^}|, q{S) :=\{S =z S'}\, r{S) :=\{ S >z S'}\ . 



These notations imply 

(8) p{S) + q{S) + r{S) = |? 7 (S') |— l = f — m — 1. 

The equation is valid because there are m ■ {I — 1) arcs which are allowed to 
be switched and S belongs to its own neighborhood. Therefore, the size of the 
neighborhood is independent of the particular schedule S, and we set n' := 
m ■ I — m. 

Now, we analyze the probability as{k) to be in the schedule S e fF after k 
steps of the logarithmic cooling schedule defined in (7), and we use the notation 



(9) 



1 



Z(S)-Z(S') 

(k + 2) 



Z(S)-Z(S') 

= e , k > 0. 



By using (3) till (5), one obtains from (6) by straightforward calculations 



1 



(10) as(t)=as(t-l).(?<:^- x: A 

^ ” i=l ^ (k + 1) 

S <z Si 

p{S)+q{S) r{S) „ /> 1 \ 

^ as.jfe- 1) ^ Y - 1) 1 



2=1 
>2 5 



i = 1 
Sj <Z s 



n' 



Z{S)-Z{Sj) 

{k + 1) 



The representation (expansion) will be used in the following as the main relation 
reducing as{k) to probabilities from previous steps. We introduce the following 
partition of the set of schedules with respect to the value of the objective func- 
tion: 

h 

Lo ■■= ^min and L ;,+1 := 5 e G .F\ IJ ^ Z{S') > Z{S))}. 

i=0 




284 



Kathleen Steinhofel, Andreas Albrecht, and Chak-Kuen Wong 



The highest level within T is denoted by . Given S' G JT, we further denote 

by Wmm(S) := [S, Sk-i, • • • , S'] a shortest sequence of transitions from S to 
-^min, he., S' e .Fmin- Thus, wc have for the distance d{S) := length(Wmin{S)) ■ 
We introduce another partition of T with respect to d{S) : 

S - 1 

S e Di d{S) = i > 0, and Di, i.e., T = Vs- 

i=l 



Thus, we distinguish between distance levels Di related to the minimal number 
of transitions required to reach an optimal schedule from and the levels Lh 
which are defined by the objective function. By definition, we have Dq := Lq = 
d-ram- We will use the following abbreviations: 



( 11 ) 

(12) 



/(S',S,t) := 
Ks{k - t) 



[k + 2 — t) 



Z(S')-Z(S) 



and 



/'C'l I 1 ^ 

f; l^.(k + 2-ty 

n' n' ' 

i = 1 
S <2 Si 



Z(Sj)-Z(S) 



We are going backwards from the k^^ step and expanding as{k) in accordance 
with (10). Our aim is to find a close upper bound for the value y2s 0 Do ^s{k) 
in terms of probabilities from previous steps. 

During the expansion (10) of as{k), S ^ Dq, terms according to S are gener- 
ated as well as according to all neighbors S' of S. Some terms generated by the 
expansion of S contain the factor ag' {k — 1) and can therefore be summarized 
with terms generated by the expansion of S' . However, it is important to distin- 
guish between elements from Di and elements from Di, i >2. For all S ^ D\, 
we obtain the following: 

fp{S) + l + q{S)+r{S) 

(13) as{k - 1) • ; 

\ n' 

p{S) ^ ^ p{S) ^ 

T7 ■ “ z(Si)-z(sY +5Z ^ 

i = i” (k + 1) ^ i = i” 

S <Z Si S <Z Si 



{k + iy 



Z{Sj)-Z{S) 



= as{k - 1 ). 



In case of S' G , some neighbors S' of S are elements of Dq and do not generate 
the terms related to S >2 S' because the as'{k) are not expanded since they 
are not present in the sum Yls G Do ^s{k). Therefore, r'(S) < r(S) many terms 
are missing for S E Di and the following arithmetic term is generated: 

(14) a5(A:-l). ( 1 -^), 

where r'(S) := | {S' : S' G r]{S)AS' G Dq} |. On the other hand, the expansion of 
as{k) generates terms related to S' G Dq with S' <2 S and containing as>{k — l) 
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as a factor. Those terms are not canceled by expansions of as>{k). All S e Di 
therefore generate the following term: 



(15) 



r'{S) 

E 



i = 1 

Sj 6 Do n rjiS) 



as,- {k - 1) 

n' 



1 



Z{S)-Z{Sj) ■ 

{k + 1) 



Now, we consider the entire sum and take the negative product as{k) ■ r'{S)/n' 
separately. By using the abbreviations introduced in (12) we derive the following 
lemma. 



Lemma 4 After one step of the expansion of '^s^Do^s{k), the sum can be 
represented by 



E ^s{k) 

Do 



as(k-l) - '-^-^sik-l) 



Si Do 



S 6 Di 



r'(S) 

s e Di i = 1 

Sj e Do n ri(S) 






^Sj(k - 1). 



The diminishing factor (l — r'{S)/n' ) appears by definition for all elements 
of D\. At subsequent reduction steps, the factor is “transmitted” successively 
to all probabilities from higher distanee levels Dj because any element of Dj 
has at least one neighbor from Di^\. The main task is now to analyze how this 
diminishing factor changes, if it is propagated to the next higher distance level. 
We denote 

(16) ^ as(/c) = ^ ia{S,t) -^sik-t) + ^ ia{S' , t) ■ as> {k - t) , 

S ^ Dq S ^ Do s' S Do 



i.e., the coefficients p{S,t) and /x(S",t) are the factors at probabilities after t 
steps of an expansion of ^ as(/c). Hence, for S £ Di we have p{S, 1) = 

1 — r'{S)/n' , and //(S', 1) = 1 for the remaining S € T>s\{Dq U I?i). For S' G Dq 
we have from Lemma 4: 



(17) 



//(S',1) 



P(S') 

E 



2=1 

Si e Dias' e ASi) 



n' 



Starting from step (k—1), the generated probabilities as>{k — u) are expanded in 
the same way as all other probabilities. We set //(S, j) := 1 — n{S,j) because we 
are mainly interested in the convergence p{S,j) -a 0. We perform an inductive 
step from {k — t + 1) to {k — t) and obtain for t>2: 
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Lemma 5 The following recurrent relation is valid for the coefficients i/{S,t), 
t > 2; 

n{S,t) = n(S,t-l)-Ksik-t) ' f{S” , S,t). 

S>zS' S<zS" 

Furthermore, for the three speeial cases S G Dj, j > t, S e D\, t = 1, and 
S <E Do, t = I we have, v{S,t) = 0, v{S,t) = r'{S)/n', and v{S,t) = 1 — 
f{Sj,S, l)/n', with Sj £ Di A S € r]{Sj) respectively. 

Exactly the same structure of the equation is valid for p{S, t) which will be used 
for elements of Do only because these elements are not present in the original 
sum '^s^Do ^s{k). Now, any n{S, t) and p,{S, t) is expressed by a sum Tu of 
arithmetic terms. We consider in more details the terms associated with elements 
S° of Do and of Di. We assume a representation jj,{S°,t — 1) = ^T{S°), 
and n{S, t - 1) = E T{S), S ^ Do- 
ff we consider r'{S'^)/n' and Eso<^ 5 i /(5'^, S°, t)/n' separately, the difficul- 
ties arising from the definition v{S, t) := 1 — p{S, t) can be avoided, i.e., we have 
to take into account only changing signs of terms during the transmission from 
D\ to Do and vice versa. 

Definition 2 The two expressions r' (S^) /n' , and'^go^^gi f{S^,S°,t)/n', are 
called source terms of i/{S^,t) and p{S°,t), respectively. 

During an expansion of ^s{k) backwards according to (13), the source 

terms are distributed permanently to higher distance levels Dj. Therefore, at 
higher distance levels the notion of a source term can be defined by an inductive 
step: 

Definition 3 For all S G Di, i > 1, any term which is generated according 
to the equation of Lemma 5 from a source term of j/{S',t — 1), where S' G 
Di-i n rj{S), is said to be a source term of j/{S, t). 

We introduce a counter e(T) to terms T which indicates the step at which the 
term has been generated from source terms. The value e(T) is called the rate of 
a term and we set e(T) = 1 for source terms T. 

The value e(T) > 1 is assigned to terms related to Do and D\ in a slightly 
different way compared to higher distance levels because at the first step the S° 
do not participate in the expansion of Es^Dq ^s{k). Furthermore, in the case 
of Do and Di we have to take into account the changing signs of terms which 
result from the simultaneous consideration of n{S^,t) (for Di) and ix{S°,t) (for 
Do). 

Definition 4 A term T° is called a rate term of ix{S°,t), S° G Do and 
j > 2, if either S° = —T and e(T) = j — I for some v{S, t — 1), S E Di ntj(S°), 
or e(T°) = j — 1 for some pfS' ,t — 1), S' E DoF r]{S°). 

A term T is called a rate term of v{S, t), S^ E D\ and j > 2, if e(T) = 
j — 2 for some v{S' , t — 1), S" G D 2 n rj{S^), e(T) = j — I for some v{S' , t — 1), 
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S" G n r]{S^), or T = — T' and e(T') = j — \ for some S° £ DqC] r]{S^) with 
respect to ir{S°,t — 1). 

A term T is called a rate term of v{S,t), S Di and i, j > 2, if e{T) = 
j —1 for some v{S', t — 1 ), S' € Di+i r]r]{S), e(T) = j — I for some v{S' ,t~ 1 ); 
S' £ Dif) r]{S), or T is a rate term of v{S, t — 1) for some S € . 

The classification of terms will be used for a partition of the summation over 
all terms which constitute particular values j/{S^,t) and ir{S°,t)- Let Tj{S,t) be 
the set of rate arithmetic terms of v{S^,t) ih-{‘S°,t)) related to S' G Vg. We 
set 

(18) A,(S,t):= 5] T. 

T€Tj{S,t) 

The same notation is used in case of S = S° G I?o with respect to /x(S^,t), and 
we obtain by induction 

t-i+l t 

(19) iy{S,t)= XI 

i=i i=i 

For S £ Di ^ D\, Do and j > 2 we obtain immediately from Lemma 5 and 
Definition 4: 



(20) Aj{S,t) = Aj^i{S,t-l)-Ks{k-t) + 



+ ^ -f {S’, S,t) 



s' <z S 
S' 6 Di-^i 



S' >z S 
S’ 6 A+i 



A,^,{S',t-l) A,^,{sy-1) ,f^g,^s,t) 



s' <Z s 
S' 6 Di 



S' >z S 
S' 6 Di 



A^{S\t^^ ^ 



s' <z s 
s' 6 A -1 



s’ >Z s 
s' € A-i 



n' 



In case of S G Di and j > 2, we have in accordance with Definition 4: 
(21) Aj{S,t) = Aj^i{S,t - 1) ■ Ks{k - t) + 



Aj-l(S^t - 1) ^ ^ 



^.f{S\S,t) 



s' <Z s 
s' € Di 



S' >z S 
S' 6 Di 






s’ >Z s 
S’ e D2 



S'eDonrjiS) 
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Finally, the corresponding relation for S° is given by 
(22) A,{S°,t) = ■ Kso{k-t) 

s>^s» ” 

We incorporate (20) till (22) in the following upper bound: 

Lemma 6 Given S £ T, k > , there exist constants a,b,c> 1 sueh that 

\A,{S,t)\<r-2->^''\ 



where j > k/c is required. 

The proof is performed by induction, using the representations (20) till (22) for 
increasing i and j, and we employ similar relations as in [19], Lemma 10 and 
Lemma 11. In the present case, we utilize the reversibility of the neighborhood 
relation and therefore the lower bound on k depends directly on T. 

We compare the computation of v{S, t) (and /x(S", t)) for two different values 
t = k\ and t = k 2 , i-e., n{S,t) is calculated backwards from ki and k 2 , respec- 
tively. To distinguish between v{S, t) and related values, which are defined for 
different k\ and k 2 , we will use an additional upper index. At this point, we 
use again the representation (19) of v{S, t) (and the corresponding equation for 

hiS',t))- 

Lemma 7 Given k 2 > k\ and S £ Di, then 

A2(S', t) = A^S, k2 — ki + t), if t > i 2. 

The proposition can be proved straightforward by induction over i, i.e., the sets 
Di. Lemma 7 implies that at step s + 2 (with respect to ki) 

Al{S,s + 2) = Al{S,k 2 -ki + s + 2) for all S e 

For A|(S', t), the corresponding equality is already satisfied in case of t > s. The 
relation can be extended to all values j > 2: 

Lemma 8 Given k 2 > ki, j > 1, and S £ Di, then 

A](S', t) = A|(S', k2-ki + t), if t > 2 - {j -l)+i. 

We recall that our main goal is to upper bound the sum ^s{k). When 

a(0) denotes the initial probability distribution, we have from (16): 

(23) I ^s{ki)- ^s{k2)\ < 

S£Do S£Do 

< K^,fci))-as(o)i+i( 5: ljfS',ki)~ ^ ijfS',ki)) ■as'(0)\ ■ 
S£Do S'€Do S'€Do 
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Lemma 9 Given k 2 > k\ > then 

, , 1/13 

I ^ {n{S,k2) - iy{S,ki))-as{0)\<2-ki 

S^Do 

for a suitable constant [3 > 1. 

The proof follows straightforward from Lemma 6 and Lemma 8. The same rela- 
tion can be derived for the /i(S", ^ 1 / 2 )- Now, we can immediately prove 

Theorem 3 The condition 

k > • log«(i) i 

0 

implies for arbitrary initial probability distributions a(0) and (^ > 0; 

as{k) < 6 and therefore, as'(^) > 1 — 

S^Do S'eDo 

Proof: We choose k in accordance with Lemma 9 and we consider 
as{k)= (as{k) - as{k 2 )) + as{k 2 ) 

S^Do S^Do S^Do 

= HS,k2)-n{S,k))-as{0) + 

S^Do 

+ {t^{S\k)-p{S\k2))-as>{0)+ ^s{k2). 

S'€Do S^Do 

The value k 2 from Lemma 9 is larger but independent of ki = k, i.e., we can 
take a k 2 > k such that as{k 2 ) < d/3. Here, we employ Theorem 1 

and 2, i.e., if the constant T from (7) is sufficiently large, the inhomogeneous 
simulated annealing procedure defined by (3) till (5) tends to the global minimum 
of Z on T . We obtain the stated inequality, if additionally both differences 
(i^(5', ^ 2 ) — i/{S,k)) and (/i(S", A:) — jj,{S' ,k 2 )) are smaller than 

6/3. Lemma 9 implies that the condition on the differences is satisfied in case of 
k\^^ >\og{3/5). q.e.d. 

References 

1 . E.H.L. Aarts. Local Search in Combinatorial Optimization. Wiley, New York, 1997. 

2. E.H.L. Aarts, P.J.M. Van Laarhoven, J.K. Lenstra, and N.L.J. Ulder. A Com- 
putational Study of Local Search Algorithms for Shop Scheduling. ORSA J. on 
Computing, 6:118-125, 1994. 

3. E.H.L. Aarts and J.H.M. Korst. Simulated Annealing and Boltzmann Machines: 
A Stochastic Approach. Wiley, New York, 1989. 

4. H. Baumgartel. Distributed Constraint Processing for Production Logistics. In 
PACT’97 - Practical Application of Constraint Technology, Blackpool, UK, 1997. 




290 Kathleen Steinhofel, Andreas Albrecht, and Chak-Kuen Wong 



5. O. Catoni. Rough Large Deviation Estimates for Simulated Annealing: Applica- 
tions to Exponential Schedules. Annals of Probability, 20(3):1109 - 1146, 1992. 

6. O. Catoni. Metropolis, Simulated Annealing, and Iterated Energy Transformation 
Algorithms: Theory and Experiments. J. of Complexity, 12(4):595 - 623, 1996. 

7. C. Chen, V.S. Vempati, and N. Aljaber. An Application of Genetic Algorithms for 
Flow Shop Problems. European J. of Operational Research, 80:389-396, 1995. 

8. P. Chretienne, E.G. Coffman, Jr., J.K. Lenstra, and Z. Liu. Scheduling Theory and 
Its Applications. Wiley, New York, 1995. 

9. M.K. El-Najdawi. Multi-Cyclic Flow Shop Scheduling: An Application in Multi- 
Stage, Multi-Product Production Processes. International J. of Production Re- 
search, 35:3323-3332, 1997. 

10. M.R. Garey, D.S. Johnson, and R. Sethi. The Complexity of Flow Shop and Job 
Shop Scheduling. Mathematics of Operations Research, 1:117-129, 1976. 

11. B. Hajek. Cooling Schedules for Optimal Annealing. Mathematics of Operations 
Research, 13:311 - 329, 1988. 

12. L.A. Hall. Approximability of Flow Shop Scheduling. In 36th Annual Symposium 
on Foundations of Computer Science, pp. 82-91, Milwaukee, Wisconsin, 1995. 

13. C.Y. Lee and L. Lei, editors. Scheduling: Theory and Applications. Annals of Op- 
erations Research, Journal Edition. Baltzer Science Publ. BV, Amsterdam, 1997. 

14. G. Liu, P.B. Luh, and R. Resch. Scheduling Permutation Flow Shops Using The La- 
grangian Relaxation Technique. Annals of Operations Research, 70:171-189, 1997. 

15. E. Nowicki and G. Smutnicki. The Flow Shop with Parallel Machines: A Tabu 
Search Approach. European J. of Operational Research, 106:226 - 253, 1998. 

16. M. Pinedo. Scheduling: Theory, Algorithms, and Systems. Prentice Hall Inter- 
national Series in Industrial and Systems Engineering. Prentice Hall, Englewood 
Cliffs, N.J., 1995. 

17. B. Roy and B. Sussmann. Les problemes d’Ordonnancement avec Constraints Dis- 
jonctives. Note DS No. 9 bis. SEMA, 1964. 

18. D.L. Santos, J.L. Hunsucker, and D.E. Deal. Global Lower Bounds for Flow Shops 
with Multiple Processors. European J. of Operational Research, 80:112 - 120, 1995. 

19. K. Steinhofel, A. Albrecht, and G.K. Wong. On Various Gooling Schedules for 
Simulated Annealing Applied to the Job Shop Problem. In M. Luby, J. Rohm, 
and M. Serna, editors, Proc. RANDOM’98, pages 260 - 279, Lecture Notes in 
Gomputer Science, vol. 1518, 1998. 

20. K. Steinhofel, A. Albrecht, and G.K. Wong. Two Simulated Annealing-Based 
Heuristics for the Job Shop Scheduling Problem. European J. of Operational Re- 
search, 1 18(3) :524-548, 1999. 

21. J.D. Ullman. AP-Gomplete Scheduling Problems. J. of Computer and System 
Science, 10(3):384-393, 1975. 

22. P.J.M. Van Laarhoven, E.H.L. Aarts, and J.K. Lenstra. Job Shop Scheduling by 
Simulated Annealing. Operations Research, 40(1):113-125, 1992. 

23. D.P. Williamson, L.A. Hall, J.A. Hoogeveen, C.A.J. Hurkens, J.K. Lenstra, 
S.V. Sevast’janov, and D.B. Shmoys. Short Shop Schedules. Operations Research, 
45:288-294, 1997. 




On the Lovasz Number of Certain Circulant 

Graphs 



Valentin E. Brimkov^, Bruno Codenotti^, Valentino Crespi^, and 
Mauro Leoncini^’^ 

^ Department of Mathematics, Eastern Mediterranean University, 
Famagusta, TRNC 

{brimkov, crespi} . asSmozart . emu.edu.tr 
^ Istituto di Matematica Computazionale del CNR, 

Via S. Maria 46, 56126-Pisa, Italy 
{ codenotti , leoncini }@imc . pi . cnr . it 
® Facolta di Economia, Universita di Foggia, 

Via IV Novembre 1, 71100 Foggia, Italy 



Abstract. The theta function of a graph, also known as the Lovasz 
number, has the remarkable property of being computable in polynomial 
time, despite being “sandwiched” between two hard to compute integers, 
i.e., clique and chromatic number. Very little is known about the explicit 
value of the theta function for special classes of graphs. In this paper we 
provide the explicit formula for the Lovasz number of the union of two 
cycles, in two special cases, and a practically efficient algorithm, for the 
general case. 



1 Introduction 

The notion of capacity of a graph was introduced by Shannon in [14], and after 
that labeled as Shannon capacity. This concept arises in connection with a graph 
representation for the problem of communicating messages in a zero-error chan- 
nel. One considers a graph G, whose vertices are letters from a given alphabet, 
and where adjacency indicates that two letters can be confused. In this setting, 
the maximum number of one-letter messages that can be sent without danger 
of confusion is given by the independence number of G, here denoted by a{G). 
If a{G^) denotes the maximum number of /c-letter messages that can be safely 
communicated, we see that a{G^) > a{G)^ . Moreover one can readily show that 
equality does not hold in general (see, e.g., [11]). The Shannon capacity of G 
is the number 0{G) = lim^^oo , which, by the previous observations, 

satisfies 0{G) > a{G), where equality does not need to occur. 

It was very early recognized that the determination of the Shannon capacity 
is a very difficult problem, even for small and simple graphs (see [8, 13]). In 
a famous paper of 1979, Lovasz introduced the theta function i){G), with the 
explicit goal of estimating 0{G) [11]. 

Shannon capacity and Lovasz theta function attracted a lot of interest in the 
scientific community, because of the applications to communication issues, but 
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also due to the connections with some central combinatorial and computational 
questions in graph theory, like computing the largest clique and finding the 
chromatic number of a graph (see [2, 3, 4, 6] for a sample of the wealth of 
different results and applications of i9(G) and 6>(G)). Despite a lot of work in 
the field, finding the explicit value of the theta function for interesting special 
classes of graphs is still an open problem. 

In this paper we present some results on the theta function of circulant 
graphs, i.e., graphs which admit a circulant adjacency matrix. We recall that a 
circulant matrix is fully determined by its first row, each other row being a cyclic 
shift of the previous one. Such graphs span a wide spectrum, whose extremes are 
the single cycle and the complete graph. We either give a formula or an algorithm 
for computing the Lovasz number of circulant graphs given by the union of 
two cycles. The algorithm is based on the computation of the intersection of 
halfplanes and (although its running time is 0(n log n) in the worst case, as 
compared with the linear time achievable through linear programming) is very 
efficient in practice, since it exploits the particular geometric structure of the 
intersection. 



2 Preliminaries 



There are several equivalent definitions for the Lovasz theta function (see, e.g., 
the survey by Knuth [10]). We give here the one that comes out of Theorem 6 
in [11], because it requires only little technical machinery. 

Definition 1. Let G be a graph and let A be the family of matriees A = (a^j) 
sueh that Oij — 0 if i and j are adjacent in G. Also, let Ai(A) > \ 2 {A) > . . . > 
A„(A) denote the eigenvalues of A. Then 



d{G) = max 1 1 - | • 

A€a\ A„(A) j 

Combining the fact that 0{G) < t^(G) with the easy lower bound 0{G^) > 
V^, Lovasz has been able to determine exactly the capacity of Cs, the pentagon, 
which turns out to be \/5. 

For several families of simple graphs, the value of d(G) is given by explicit 
formulas. For instance, in the case of odd cycles of length n we have 



^{Gr,) = 



ncos(7r/n) 

1 + cos(7r/n) 



We now sketch the proof of correctness of the above formula (see [10] for more 
details), because it will be instrumental to the more general results obtained in 
this paper. 

With reference to the definition of the Lovasz number which resorts to the 
minimum of the largest eigenvalue over all feasible matrices (Section 6 in [10]), in 
the case of n-cycles, we have that a feasible matrix has ones everywhere, except 
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on the superdiagonal, subdiagonal and the upper-right and lower-left corners, i.e. 
it can be written as C = J + xP + xP^'^ , where J is a matrix whose entries are all 
equal to one, and P is the permutation matrix taking j into {j + 1) mod n. It is 
well known and easy to see that the eigenvalues of C are n-|-2a;, and x{uj^ 
for j = 1, . . . ,n — 1, where u = The minimum over x of the maximum of 

these values is obtained when n + 2a; = — 2a;cos7r/n, which immediately leads 
to the above formula. 

3 The Function "d of Circulant Graphs of Degree 4 

Let n be an odd integer and let j be such that 1 < j < Let C{n,j) 

denote the circulant graph with vertex set {0, ...,n — 1} and edge set {{i,i + 
1 mod n}, {i, i + j mod n}, i = 0, ..., n — 1}. By using the approach sketched in 
[10], we can easily obtain the following result. 

Lemma 1. Let fo{x, y) = n + 2a; + 2y and, for some fixed value of j, f{x, y) = 
2a;cos — + 2ycos i = 1, ..., n — 1. Then 

n ^ n ^ ’ ’ 

= minmax |/i(a;,y) , f = 0, 1, . . . , I . (1) 

x,y * 2 J 

Proof. Follows from the same arguments which lead to the known formula for 
the Lovasz number of odd cycles [10] (i.e., taking advantage of the fact that we 
can restrict the set of feasible matrices within the family of circulant matrices) 
and observing that, for i > 1, fi{x, y) = fn-i{x, y). □ 

3.1 A Linear Programming Formulation 

Throughout the rest of this paper we will consider the following linear program- 
ming formulation of (1). 

minimize z 

s.t. /i(a;,y) - 2 < 0, f = 0,..., (2) 

z>Q, 

where the ffx, yfs are defined in Lemma 1. 

Consider the intersection C of the closed halfspaces defined by z > 0 and 
fi{x,y) — z < 0, i = 1,..., (which is not empty, since any point (0,0, A;), 
A: > 0, satisfies all the inequalities). C is a polyhedral cone with the apex at the 
origin. This follows from the two following facts, which can be easily verified: (1) 
the equations fi(x,y) — z = 0, i > 1, define hyperplanes through the origin; (2) 
for any zq > 0, the projection Qz„ of C P|{z = 2 : 0 } onto the xy plane is a polygon, 
i.e., Qzq is bounded^ (see Fig. 1, which corresponds to the graph (7(13,2)). 

Consider now the first constraint of formulation (2). The region represented 
by such constraint is the halfspace above the plane A with equation n + 2a; + 



^ In the appendix we shall give a rigorous proof of this fact for the case j = 2. 
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Fig. 1. The polyedral cone for n = 13 and j = 2 cut at z = 2 

2y — z = 0. It is then easy to see that the minimum z of (2) will correspond to 
the point P = (x, y, z) of C that is the last met by a sweeping line, parallel to 
the line y = —x, which moves on the surface of A towards the negative ortant 
(we will simply refer to these as to the extremal vertices). In particular, x and 
y are the coordinates of the extremal vertex v of the convex polygon Qz in the 
third quadrant. The lines which define u have equations 2a: cos o; + 2ycos(Q;j) = z 
and 2x cos l3 + 2ycos(/?j) = z, where a = and /? = for some indices 
i\ and The key property, which we will exploit both to determine a closed 
formula for the (in the cases j = 2 and j = 3) and to implement an efficient 
algorithm for the general case of circulant graphs of degree 4, is that ii and i 2 
can be computed using “any” projection polygon Qz^, Zq > 0, and determining 
its extremal point in the third quadrant. Once ii and i 2 are known, z can be 
computed by solving the following linear system 

{ 2a: cos a + 2y cos{ja) — z = 0, 

2a:cos/?+ 2ycos(j/3) — z = 0, (3) 

2a: + 2y — z = — n. 



3.2 The Special Case j = 2 

The detailed proof of the following theorem is deferred to the appendix. 



Theorem 1. 



t){C{n, 2)) = n 



i -cos(^[n/3j) -cos(^([n/3j + 1)) \ 

(cos(^Ln/3j) - l)(cos(^(Ln/3j + 1)) - I);' ' 



(4) 



3.3 The Special Case j = S 

Consider again the projection polygon Qzg, for some zq > 0. We know from 
Section 3.1 that the value of is the optimal value z of the objective function in 
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the linear program (2), and that this value is achieved at the extremal vertex P 
of Qz in the third quadrant. Also, we know that any projection polygon Qzg can 
be used to determine the two lines fi{x,y) — z = 0, i > 1, which form P. It turns 
out that finding such lines is easy when j = 3. In the following we will say that 
the line li has positive x-intercept (resp., y-intercept) if the intersection between 
li and the x-axis (resp., y-axis) has positive x-coordinate (resp., y-coordinate), 
otherwise we will say that the intercept is negative. The crucial observation is 
the following. Among the lines with negative x- and y-intercepts, ^[n/ 2 j is the 
one for which these intersections are closest to the origin. It then follows that 
P must lay on this line and (after a moment of thought) that the second line 
forming P must be searched among those with positive slope. Let Xi and y^ 
denote the coordinates of the intersection between the line h and the line ^[n/ 2 j • 
Now, since l[n/ 2 j is slightly steeper than the line with equation y = —x, the line 
sought will be the one with positive x-intercept, negative y-intercept, and such 
that i/i is maximum (see Fig. 2). We shall prove that such line is the one with 
index h = • 



To this end, observe first that the requirement of positive x-intercept and 
negative y-intercept implies < * < f (recall that k has equation y = 

cos 

~ s"i X + )■ To prove that In maximizes y^ we show that, for any integer 

i ^ n in the interval < * < f , the three points Vi = (xi,yi), Vn = {xn, yn) and 
O = (0, 0) form a clockwise circuit. We already know (see the Appendix) that 
this amounts to proving that d{vi,Vn,0) = Xiyn — ViXn < 0. This is easy; the 
only formal difficulty is working with the integer part of Clearly this might 
be circumvent by dealing with the three different cases, namely n = 6k + 1, 
n = 6k + 3, and n = 6k + 5, for some positive integer k. For simplicity we shall 
prove the first case only. Now, for n = 6k + 1 we have and, using 

zq = 2 , 
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Xfi ^ 



cos ^ cos(^^ - 



j-\-cos^ 



Xi = 



cos ^+cos 

; — cos —cos cos — ’ 



cos — +COS 



cos — cos ■ 



After some relatively simple algebra we obtain 



d{Vi,Vn,0) 

where 



a + /? + 7 

(cos — cos (f — + COS^ 21) (cos — cos — — cos - cos — ) ’ 

V n \ 6 on / n/V n n n n / 



a = cos I (cos ^ + cos , 

/3 = cos(%2) (cos ^ - cos ^) , 

7 = cos (| — (cos ^ + cos . 



It is easy to check that a,/?, 7 > 0 while the denominator of d{vi,Vn,0) is 
negative for the admissible values of i. We are now able to determine the value 
of the d function of C{n, 3). 



Theorem 2. 



'd{C{n, 3)) = n 1 — 



2vrr2-j^l 



+COS 



2 2,rr2l^1 



-1 



2tj. r n — 3 -1 2tj- r n — 3 -1 

(cos — + l)(cOS ^SZ 1)(1— cos — +COS ^ ) 

^ rj. 2 \ y \ Yi. n ' 



(5) 



Proof. t9(C(n, 3)) is the value of 2: in the solution to the linear system (3) where 

o * ^ / I cos^ q:+cos a cos /3+cos^ /3 — 1 \ ■> 27^i^ j 

7=3, i.e., 2: = n 1 — 7i ^ ^ ^ ^ , where a = and 

’ \ (1 — cos q:)(cos p — l)(cos ct+cos p + l) / ’ n 

f3 = 2^. By the previous results we know that i\ = [fj and Z2 = 
Plugging these values into the expression for z we get the desired result. □ 



4 An Efficient Algorithm and Computational Results 

Although the Lovasz number can be computed in polynomial time, the available 
algorithms are far from simple and efficient (see, e.g., [1]). It is thus desirable 
to devise efficient algorithms tailored to the computation of d for special classes 
of graphs. By reduction to linear programming, the theta function of circulant 
graphs can be computed in linear time, provided that the number of cycles is 
independent of n. The corresponding algorithms are not necessarily efficient in 
practice, though. We briefly describe a practically efficient algorithm for com- 
puting d{C{n,j)), i.e., in case of two cycles. 

The algorithm first determines the 2 lines forming the extremal vertex of Qi 
in the third quadrant, then solves the resulting 3x3 linear system (i.e., the 
system (3)). More precisely, the algorithm incrementally builds the intersection 
of the halfplanes which define Qi (considering only the third quadrant) and 
keeps track of the extremal point. The running time is 0(n log n) in the worst 
case (i.e., it does not improve upon the optimal algorithms for computing the 
intersection of n arbitrary halfplanes or, equivalently, the convex hull of n points 
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in the plane). However, it does make use of the properties of the lines bounding 
the halfplanes to keep the number of vertices of the incremental intersection 
close to the minimum possible. In some cases (such as C{n, 2)) this is still J7(n), 
but for most values of n and j it turns out to be substantially smaller. 

Using the above algorithm we have performed some preliminary experiments 
to get insights about the behavior of the theta function for the special class of 
circulant graphs considered in this abstract. Actually, since the value sandwiched 
by the clique and the chromatic number of C (n, j ) is the theta function of C (n, j ) 
(i.e., the complementary graph of C{n,j)), the results refer to d{C{n,j)) = 

n 

Table 4 shows i9{C{n,j)) approximated to the four decimal place, for a num- 
ber of values of n and j. It is immediate to note that, for a fixed value of j, 
the values of the theta function seem to slowly approach, as n grows, the lower 
bound (given by the clique number), which happens to be 2 almost always (ob- 
vious exceptions occur when 3 divides n and j ^ 



Table 1. Some computed values of d{C{n,j)) 





4 


5 


6 


7 


LfJ 


[ t 1 


LtJ 


1 tJ + 1 


9 . 


51 


2.2446 


2.0474 


2.1227 


2.0838 


2.1297 


2.2446 


3 


2.0173 


2.2446 


101 


2.2383 


2.0121 


2.1122 


2.0228 


2.2383 


2.1162 


2.2383 


2.0044 


2.2383 


201 


2.2366 


2.0031 


2.1103 


2.0059 


2.2366 


2.1113 


3 


2.0011 


2.2366 


301 


2.2363 


2.0014 


2.1099 


2.0027 


2.2363 


2.1099 


2.0005 


2.2363 


2.2363 


401 


2.2362 


2.0008 


2.11 


2.0015 


2.2362 


2.1102 


2.2362 


2.0003 


2.2362 


501 


2.2362 


2.0005 


2.11 


2.001 


2.2362 


2.1102 


3 


2.0002 


2.2362 


1001 


2.2361 


2.0001 


2.1099 


2.0002 


2.2361 


2.1099 


2.2361 


2 


2.2361 


2001 


2.2361 


2 


2.1099 


2.0001 


2.2361 


2.1099 


3 


2 


2.2361 


3001 


2.2361 


2 


2.1099 


2 


2.2361 


2.1099 


2 


2.2361 


2.2361 


4001 


2.2361 


2 


2.1099 


2 


2.2361 


2.1099 


2.2361 


2 


2.2361 


5001 


2.2361 


2 


2.1099 


2 


2.2361 


2.1099 


3 


2 


2.2361 


10001 


2.2361 


2 


2.1099 


2 


2.2361 


2.1099 


2.2361 


2 


2.2361 



This is confirmed by the results in Table 2, which depicts the behavior of 
the relative distance dnj of from the clique number. We only con- 

sider odd values of j (so that the clique number is always 2); we also rule out 
the cases where j = ^, for which we know there is a (relatively) large gap be- 
tween clique number and theta function. More precisely. Table 2 shows: (1) the 
maximum relative distance M = maxj^„ dnj , where n ranges over all odd in- 
tegers from 9 to h; (2) the average relative distance jj, = -^^j n'^nj, where 
Nn is the number of admissible pairs {n,j); (3) the average quadratic distance 

^ ~ 77^ '^j,n i^nj — M) • 

The regularities presented by the value of the theta function and by the 
geometric structure of the optimal lines suggest the possibilities of further ana- 
lytic investigations. For instance, we have observed that, for j = 4, the formula 
i = [^arccos ~^^^ J seems to correctly predict the index of the first optimal 
line, in perfect agreement with the experimental results. In general, for j even 
and j << n, up to a value j, the optimal point seems to always correspond 
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Table 2. Relative distances of 'd{C{n,j)) from the clique number 



n 


M 


/i 


a 


101 


0.372402 


0.056077 


0.004343 


201 


0.372402 


0.033712 


0.002600 


301 


0.372402 


0.024840 


0.001876 


401 


0.372402 


0.019897 


0.001471 


501 


0.372402 


0.016734 


0.001214 


1001 


0.372402 


0.009657 


0.000653 



to two consecutive indices. For j odd, the first line giving the optimal point is 
almost always obtained at the index the second line varies with j, but with 
a regular behaviour. 

5 Conclusions 

This paper has provided a first step towards extending the class of graphs for 
whose theta function either a formula or a very efficient algorithm is available. 
Work in progress by the authors [5] aims at finding an efficient algorithm for 
more general circulant graphs. We believe that the results of this paper together 
with the above mentioned more general results will contribute to shedding new 
lights on the properties of this fascinating function. 
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Appendix 

In this appendix we shall prove Theorem 1. Before that, we need to establish 
the following subsidiary result. 

Theorem 3. Let n be odd and n > 7. Also, as in Section 3, let C be the inter- 
section of the halspaces defined by the inequalities z > 0 and fi{x,y) — z < 0, 
i = 1,..., Then, C is a polyhedral cone with the apex at the origin. The 
1- dimensional faces of C (i.e., the edges ofC) are the intersections (in the half- 
space z > 0) of “consecutive” pairs of planes Pi, Ps{i), where Pi is defined by 
the equation fi{x, y) — z = 0 and 



s{i) = 



i + 1 if f < 

1 otherwise. 



Proof. It is sufficient to show that (1) for any zq > 0, Qzg is bounded^ (i.e., is 
a polygon), so that the polyhedron is indeed a pointed cone with the apex at 
the origin, and (2) Qz^ has exactly vertices, formed by the intersections of 
pairs of consecutive lines k and where k has equation fi{x,y) — zq = 0, 
i = 1, .., To this end, we first establish a couple of preliminary results. 

Given any two intersecting lines I and I' in the 2D space, we will say that 
I' is clockwise from I if the two lines can be overlapped by rotating I clockwise 
around the intersection point by an angle of less than 7t/2 radians. 

Lemma 2. For n > 11, k+i is clockwise from k, i = 1 , ..., — 1. 

Proof. The equation defining li can be written as y = mjX + qi, where mt = 

COS 

— iti . Thus — < (fii = arctg(mi) < § is the angle between the positive 
x-axis and k. It is clearly sufficient to prove the following statements. 

1. If (fii(pi+i > 0 then (fii > ipi+i; 

2. if ifi > 0 and ifi+i < 0 then <pi — ifip\ < 

3. if (pi < 0 and ifip\ > 0 then ifip\ — ifi > ^. 



^ Recall that Qzg is the projection of C f^{z = zo} onto the xy plane. 
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It is not difficult to see that Condition 1 (i.e., LpiLpi^i > 0) occurs if and only 
if ^{k — 1) < coscxi < cosoi+i < ^k, for some k G {1,2, 3, 4}. Since the 
denominator of rrit does not vanish when t G [i,i + 1], then nit is a continuous 
function. We shall then prove that ifi > (fii+i by showing that, for t G [i,i + 1], 
nit is a monotone decreasing function of t. Indeed, we have 

2tt — sin at cos 2at + 2 cos at sin 2at 

n cos^ 2at 

2tt sin at cos 2at — 4 sin at cos^ at 
n cos^ 2at 

2tt sin at (cos 2o;t — 4 cos^ at ) 
n cos^ 2at 

2tt sin at {2 cos^ at — 1 — 4 cos^ at ) 
n cos^ 2at 

27t sin o;t(— 1 — 2 cos^ (it) ^ 
n cos^ 2at 

The proof of statements 2 and 3 becomes simpler if we assume that n be large 
enough (although only statement 3 requires that n > 11 in order to hold true). 
Suppose first that ipt > 0 and (pi+i < 0. This only happens if ^ < f < 

(i.e., if both angles are close to |-). For n large enough this clearly means that 
both rtii and — rui+i are positive and close to zero, which in turn implies that 
Vi ~ Vi+i is close to 0. 

The proof of statement 3 is similar. The condition Lpt < Q and Vi+i > 0 occurs 
if either ^ < f < or ^ It both cases, —rtii and rui+i 

approach infinity as n grows, which means that Vi+i ^ Vi approaches tt. □ 



druf 

dt 



As an example, in Fig. 3 we see (following the clockwise order) that, for n = 13 
and z = 1, 2, 3, the line k+i is indeed clockwise from It. 

Lemma 3. For i = 1, ..., — 1, let vt = ii+i denote the intersection point 

between and U. Then any two points vt and uz+i, for i = 1,..., — 2, 

together with the origin form a clockwise eireuit. 

Proof. It is well known (see, e.g., [12]) that three arbitrary points a = (ao,«i), 
b = {bo,bi), and c = (cq, C\) form a clockwise circuit if and only if 



d{a, b, c) 



Oq Ui 1 

bo bi 1 
Co Cl 1 



< 0 . 



In our case cq = Ci = 0, so that the above determinant simplifies to ao6i — ai bo, 
where oq and oi (resp., bo and hi) are the coordinates of vt (resp., ui+i). To 
determine Oq and Oi we can solve the 2x2 linear system (where, for simplicity, 
we have set zo = 1) 



f fi{x,y)-4 = d 
\ fi+i{x,y) -1 = 0 , 




On the Lovasz Number of Certain Circulant Graphs 301 





Fig. 3. Lines add in clockwise order (n = 13) 



obtaining 



cos(2t(f + 1)) — cos(2tf) 

2(cos(2t(f + 1)) cos{ti) — cos(2ti) cos(t(i + 1))) ’ 



and 

cos{ti) cos(2ti) — cos(2ti) cos(t(i + 1)) 

^ 2cos(2tf)(cos(2t(i + 1)) cos{ti) — cos(2tf) cos(t(f + 1))) ’ 

where t = —. 

n 

For Vi+i we clearly obtain similar values (the correspondence being is exactly 
given by replacing i with i + 1 everywhere) . After some simple but quite tedious 
algebra, the corresponding formula for the determinant simplifies to 

cos(t) cos{t{i + 1)) — cos{ti) 

Vt, Ui+i, ^2 cos{t{i + 1)) + 1)(2 cos{t{i + 1)) cos{t{i + 2)) + 1) 

We now prove that d = d(ui,Ui+i,0) < 0. Consider the numerator of d. Since 
cos{ti) > cos(t(i + l)) (recall that 0 < ti < t{i + l) < t{i + 2) < tt), the numerator 
is clearly negative when cos(ft) > 0. However, since cos(t) cos(t(i + l)) —cos{ti) = 
cos{t{i + 2)) — cos(t) cos(t(i + 1)) we can easily see that the numerator is also 
negative when cos{ti) < 0. It remains to show that the denominator of d is 
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positive. The denominator is the product of two terms, and the same argument 
applies to each of them. Clearly 2 cos{ti) cos(t(i+l)) + l > 0 when cos{ti) cos{t{i+ 
1)) > 0. Hence the term might be negative only when ti < ^ < t{i + 1) . In this 
case, however, both angles are close to § and thus |2 cos(ti) cos{t{i + 1)) | is small 
compared to 1 (as in the proof of Lemma 2, this fact is obvious for large n, 
although it holds for any n > 7). □ 

We are now able to complete the proof of Theorem 3. As in Lemma 3, let Vi 

denote the intersection point of k-i and k, i = 1,..., — 1. Also, let Un-i 

^ 2 

denote the intersection point of Ir^ and l\. By lemmas 2 and 3, we know that 

2 

any three consecutive vertices of the closed polygon L = make a 

right turn (except, possibly, for the two triples which include Vr^ and m). We 
also know that the angle ipi, a,s a, function of i, changes sign three times only, 
starting from a negative value for f = 1. Hence the polygon L may possibly have 
the three shapes depicted in Fig. 4: (1) L is convex, (2) L is simple but not 
convex, (3) L is not simple. 




Fig. 4. Three possible forms for the polygon L of Theorem 3 



Case 2 would clearly correspond to Qz^ being unbounded, while case 3 would 

imply that the number of vertices of Qzg is less than and that not all of them 

are formed by the intersection of consecutive lines. Hence, to prove the result, 

we have to prove that only Case 1 indeed occurs. But this is easy. In fact, cases 

2 and 3 can be ruled out simply by observing that that the three points V n-i . 

2 

vi, and O make a left turn, while in case 1 they make a right turn. Computing 
the appropriate determinant d we get 

-(2a-l)(a + l)(4o2-2a-l) 

“ 2(4(j3 -2a- l)(32o6 - 48a^ + 20a^ - 1) ’ 

where a = cos—. Now, the numerator is negative for x > .8090169945 (the 
largest root of 4a;^ — 2a; — 1 = 0) while the denominator is positive for x > 
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.8846461772 (the unique real root of 4a;^ — 2a; — 1 = 0). But for n = 7 we already 
have a = .9009688678; hence d < 0 for any n > 7 and the three points make a 
right turn, as required. 

As the last observation, we recall that the proof holds for odd n > 11 (because 
of Lemma 2). However, the result is true for any odd n > 7, as can be seen by 
directly checking the cases n = 7 and n = 9 (see Fig. 5). □ 





Fig. 5. Projection Q^o for n = 7 (left) and n = 9 



Proof of Theorem 1. Consider the linear system (3) with J = 2. By Theorem 
3, we know that a = and [3 = , for some i G {l, ..., — l}. The 

solution to (3) is given by a; = ^7 — cos«+cos^ _ ^ 1 — and 

z = n + 2a; + 2y = n ^1 — (eos a^^iXcoTy-i) ) ■ determine the 

value of i which minimizes z. More precisely, we will compute the minimum, 
over the set {l, 2, ..., — l}, of the following function 



9n{x) = 



cos — + cos 



27r(ir+l) 



(cos 



2ttx 



l)(cos 



27r(fc+l) 



1 ) 



gn{x) is a continuous function in the open interval (0,27 t), with lim,j,^o+ = 
lim,j,^27r- = +00. Computing the derivative we obtain 



9n{x) = -- 



IT 



CSC ^ CSC 



8n 



-hn{x), 



where hn{x) = sin ^ sin .y sin -p sin Clearly, 

9n{x) = 0 if and only if hn{x) = 0. As a first rough argument (which is useful 
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just to locate the zero x of hn), we note that hn{x) > 0 if < tj-. This 

implies that x must be greater than But then, for n large enough, we may 
“approximate” hn{x) with hn{x) = 2 sin + 2 sin which vanishes at a; = 

So i ^ and we see that ^ < 2x — l < 2a; + 3 < tt and tt < 4i + l < 4i + 3 < 

We now use this result to obtain tight bounds for x. We observe that hn{x) is 
positive if 



sin 27 t — 



(4a; + 3)7 t' 



n 



< mm l^sin 

(2a; + 3)7 t 
= sm , 



. (4a; + l)7r 

sm ^ 

n 



. {2x — \)tv . (2a; + 3)7r' 



\ / 


. (4a; + 3)7 t 




1 = max < 


sm 

n 


1 



n 



-, sm ■ 



n 



which amounts to saying that x cannot be less than f — 1- Analogously, hn{x) 
is negative if 



sin 2tt — 



(4a; + 1)7T 



n 



mm 



(4a; + 3)7 t 
sm 



n 



(4a; + 1)7T 

sm 

n 



(2x — 1)7T (2a; + 3)7 t ' 

> max { sin , sm 



; I sir 



n 



. (2a; — 1)7T 
= sm , 



which implies that x cannot be larger than This fact allows us to conclude 
that the integer value which minimizes gn(x) (and hence z) is one among — 
1, and We now prove that the value sought is by showing that 
9n{[^\ - 1) - ffn(LtJ) and -ffn(LtJ) are both positive. For simplicity, 

we shall use the following notation: /[j-i = cos ^ — ^.Qg 

/n = cos _^1 = COS 27T(rn/3l+l) ^ 



5n(r|i)-3n(L|j) 



/n + /n+i ~ _ /[j + /n ~ 

(/n - i)(/n+i ^ 1) (/u^i)(/n^i) 

/riAj ~/n + /n+i/[j ~/n+i ~ |/u + k _ 
ifu - i)(.fn - i)(/n+i - 1) 
/[j/n+i-/u +/n/n+i-/n -|/n+i + i 
(/u “ i)(/n “ i)(/n+i “ 1) 

(/[j -/ri+i)(l + /n) 
ifu - i)(.fn - i)(.fn+i - 1) 



The last expression is positive since the denominator is negative, /y — /[]+i > 0, 
and /[-] < — Similarly, 




« IN /[J-1 + ./'u - 1/2 _ /[J +/n - 1/2 
3^^ (/lj-i-1)(/lj -1) (/u-i)(/n-i) 
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^ /[j-i/n - /[j-i + /[j/n -/u - i/n + 1 _ 
(/[j-i - i)(/u ^ i)(/n ^ 1) 

/[j/lj -1 - /[j +/n/[j-i -/n - + 1 

(/lj -1 - i)(/lj - i)(/n - 1) 

^ (/n ~/lj-i)(I + /[j) 

(/lj-1 - !)(/[] “ 1)(/[1 “ 1))’ 

and again the numerator is negative since /[-] — /[j-i < 0 and /y > — □ 
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Abstract. Byte pair encoding (BPE) is a simple universal text com- 
pression scheme. Decompression is very fast and requires small work 
space. Moreover, it is easy to decompress an arbitrary part of the orig- 
inal text. However, it has not been so popular since the compression is 
rather slow and the compression ratio is not as good as other methods 
such as Lempel-Ziv type compression. 

In this paper, we bring out a potential advantage of BPE compression. 
We show that it is very suitable from a practical view point of com- 
pressed pattern matching, where the goal is to find a pattern directly in 
compressed text without decompressing it explicitly. We compare run- 
ning times to find a pattern in (1) BPE compressed files, (2) Lempel-Ziv- 
Welch compressed files, and (3) original text files, in various situations. 
Experimental results show that pattern matching in BPE compressed 
text is even faster than matching in the original text. Thus the BPE 
compression reduces not only the disk space but also the searching time. 



1 Introduction 

Pattern matching is one of the most fundamental operations in string processing. 
The problem is to find all occurrences of a given pattern in a given text. A lot of 
classical or advanced pattern matching algorithms have been proposed (see [8,1]). 
The time complexity of pattern matching algorithm is measured by the number 
of symbol comparisons between pattern and text symbols. The Knuth-Morris- 
Pratt (KMP) algorithm [19] is the first one which runs in linear time proportional 
to the sum of the pattern length m and the text length n. The algorithm re- 
quires additional memory proportional to the pattern length m. One interesting 
research direction is to develop an algorithm which uses only constant amount 
of memory, preserving the linear time complexity (see [11,7,5,13,12]). Another 
important direction is to develop an algorithm which makes a sublinear number 
of comparisons on the average, as in the Boyer-Moore (BM) algorithm [4] and 
its variants (see [24]). The lower bound of the average case time complexity is 
known to be 0(n log m/m) [27], and this bound is achieved by the algorithm 
presented in [6]. 
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From a practical viewpoint, the constant hidden behind 0-notation plays 
an important role. Horspool’s variant [14] and Sunday’s variant [22] of the BM 
algorithm are widely known to be very fast in practice. In fact, the former is 
incorporated into a software package Agrep, which is understood as the fastest 
pattern matching tool developed by Wu and Manber [25]. 

Recently, a new trend for accelerating pattern matching has emerged: speed- 
ing up pattern matehing by text compression. It was first introduced by Manber 

[20] . Contrary to the traditional aim of text compression — to reduce space re- 
quirement of text files on secondary disk storage devices — , text is compressed 
in order to speed up the pattern matching process. 

It should be mentioned that the problem of pattern matching in compressed 
text without decoding, which is often referred to as compressed pattern mateh- 
ing, has been studied extensively in this decade. The motivation is to investigate 
the complexity of this problem for various compression methods from the view- 
point of combinatorial pattern matching. It is theoretically interesting, and in 
practice some algorithms proposed are indeed faster than a regular decompres- 
sion followed by a simple search. In fact, Kida et al. [18,17] and Navarro et al. 

[21] independently presented compressed pattern matching algorithms for the 
Lempel-Ziv- Welch (LZW) compression which run faster than a decompression 
followed by a search. However, the algorithms are slow in comparison with pat- 
tern matching in uncompressed text if we compare the CPU time. In other words, 
the LZW compression did not speed up the pattern matching. 

When searching text files stored in secondary disk storage, the running time 
is the sum of file I/O time and CPU time. Obviously, text compression yields a 
reduction in the file I/O time at nearly the same rate as the compression ratio. 
However, in the case of an adaptive compression method, such as Lempel-Ziv 
family (LZ77, LZSS, LZ78, LZW), a considerable amount of CPU time is devoted 
to an extra effort to keep track of the compression mechanism. In order to reduce 
both of file I/O time and CPU time, we have to find out a compression scheme 
that requires no such extra effort. Thus we must re-estimate the performance of 
existing compression methods or develop a new compression method in the light 
of the new criterion: the time for finding a pattern in compressed text directly. 

As an effective tool for such re-estimation, we introduced in [16] a unify- 
ing framework, named collage system, which abstracts various dictionary-based 
compression methods, such as Lempel-Ziv family, and the static dictionary meth- 
ods. We developed a general compressed pattern matching algorithm for strings 
described in terms of collage system. Therefore, any of the compression meth- 
ods that can be described in the framework has a compressed pattern matching 
algorithm as an instance. 

Byte pair encoding (BPE, in short) [10], included in the framework of col- 
lage systems, is a simple universal text compression scheme based on the pattern- 
substitution [15]. The basic operation of the compression is to substitute a single 
character which did not appear in the text for a pair of consecutive two char- 
acters which frequently appears in the text. This operation will be repeated 
until either all characters are used up or no pair of consecutive two characters 
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appears frequently. Thus the compressed text consists of two parts: the substi- 
tution table, and the substituted text. Decompression is very fast and requires 
small work space. Moreover, partial decompression is possible, since the com- 
pression depends only on the substitution. This is a big advantage of BPE in 
comparison with adaptive dictionary based methods. Despite such advantages, 
the BPE method has received little attention, until now. The reason for this is 
mainly the following two disadvantages: the compression is terribly slow, and 
the compression ratio is not as good as other methods such as Lempel-Ziv type 
compression. 

In this paper, we pull out a potential advantage of BPE compression, that 
is, we show that BPE is very suitable for speeding up pattern matching. Man- 
ber [20] also introduced a little simpler compression method. However since its 
compression ratio is not so good and is about 70% for typical English texts, the 
improvement of the searching time cannot be better than this rate. The com- 
pression ratio of BPE is about 60% for typical English texts, and is near 30% 
for biological sequences. We propose a compressed pattern matching algorithm 
which is basically an instance of the general one mentioned above. Experimental 
results show that, in CPU time comparison, the performance of the proposed 
algorithm running on BPE compressed files of biological sequences is better than 
that of Agrep running on uncompressed file of the same sequences. This is not 
the case for English text files. Moreover, the results show that, in elapsed time 
comparison, the algorithm drastically defeats Agrep even for English text files. 

It should be stated that Moura et al. [9] proposed a compression scheme that 
uses a word-based Huffman encoding with a byte-oriented code. The compres- 
sion ratio for typical English texts is about 30%. They presented a compressed 
pattern matching algorithm and showed that it is twice faster than Agrep on 
uncompressed text in the case of exact match. However, the compression method 
is not applicable to biological sequences because they cannot be segmented into 
words. Eor the same reason, it cannot be used for natural language texts written 
in Japanese in which we have no blank symbols between words. 

Recall that the key idea of the Boyer-Moore type algorithms is to skip sym- 
bols of text, so that they do not read all the text symbols on the average. The 
algorithms are intended to avoid ‘redundunt’ symbol comparisons. Analogously, 
our algorithm also skips symbols of text in the sense that more than one symbol 
is encoded as one character code. In other words, our algorithm avoids processing 
of redundant information about text. Note that the redundancy varies depend- 
ing on the pattern in the case of the Boyer-Moore type algorithms, whereas it 
depends only on the text in the case of speeding up by compression. 

The rest of the paper is organized as follows. In Section 2, we introduce 
the byte pair encoding scheme, discuss its implementation, and estimate its 
performance in comparison with Compress and Gzip. Section 3 is devoted to 
compressed pattern matching in BPE compressed files, where we have two im- 
plementations using the automata and the bit-parallel approaches. In Section 4, 
we report our experimental results to compare practical behaviors of these al- 




Speeding Up Pattern Matching by Text Compression 309 



gorithms performed. Section 5 concludes the discussion and explains some of 
future works. 



2 Byte Pair Encoding 

In this section we describe the byte pair encoding scheme, discuss its imple- 
mentation, and then estimate the performance of this compression scheme in 
comparison with widely-known compression tools Compress and Gzip. 



2.1 Compression Algorithm 

The BPE compression is a simple version of pattern-substitution method [10]. 
It utilizes the character codes which did not appear in the text to represent 
frequently occurring strings, namely, strings of which frequencies are greater 
than some threshold. The compression algorithm repeats the following task until 
all character codes are used up or no frequent pairs appear in the text: 

Find the most frequent pair of consecutive two eharacter codes in the 
text, and then substitute an unused code for the occurrences of the pair. 

For example, suppose that the text to be compressed is 

To = ABABCDEBDEFABDEABC. 

Since the most frequent pair is AB, we substitute a code G for AB, and obtain the 
new text 

Ti = GGCDEBDEFGDEGC. 

Then the most frequent pair is DE, and we substitute a code H for it to obtain 

T2 = GGCHBHFGHGC. 

By substituting a code I for GC, we obtain 

T3 = GIHBHFGHI. 

The text length is shorten from |To| = 18 to IT 3 I = 9. Instead we have to encode 
the substitution pairs AB ^ G, DE ^ H, and GC ^ I. 

More precisely, we encode a table which stores for every character code what 
it represents. Note that a character code can represent either (1) the character 
itself, (2) a code-pair, or (3) nothing. Let us call such table substitution table. In 
practical implementations, an original text file is split into a number of fixed-size 
blocks, and the compression algorithm is then applied to each block. Therefore 
a substitution table is encoded for each block. 
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2.2 Speeding Up of Compression 

In [10] an implementation of BPE compression is presented, which seems quite 
simple. It requires 0{£N) time, where N is the original text length and £ is the 
number of character codes. The time complexity can be improved into 0{i + N) 
by using a relatively simple technique, but this improvement did not reduce the 
compression time in practice. Thus, we decided to reduce the compression time 
with sacrifices in the compression ratio. 

The idea is to use a substitution table obtained from a small part of the text 
(e.g. the first block) for encoding the whole text. The disadvantage is that the 
compression ratio decreases when the frequency distribution of character pairs 
varies depending on parts of the text. The advantage is that a substitution table 
is encoded only once. This is a desirable property from a practical viewpoint of 
compressed pattern matching in the sense that we have to perform only once any 
task which depends on the substitution table as a preprocessing since it never 
changes. 

Fast execution of the substitutions according to the table is achieved by an 
efficient multiple key replacement technique [2,23], in which a one-way sequential 
transducer is built from a given collection of replacement pairs which performs 
the task in only one pass through a text. When the keys have overlaps, it replaces 
the longest possible first occurring key. The running time is linear in the total 
length of the original and the substituted text. 



2.3 Comparison with Compress and Gzip 

We compared the performance of BPE compression with those of Compress and 
Gzip. We implemented the BPE compression algorithm both in the standard way 
described in [10] and in the modified way stated in Section 2.2. The Compress 
program has an option to specify in bits the upper bound to the number of 
strings in a dictionary, and we used Compress with specification of 12 bits and 
16 bits. Thus we tested five compression programs. 

We estimated the compression ratios of the five compression programs for 
the four texts shown in Table 1. The results are shown in Table 2. We can see 
that the compression ratios of BPE are worse than those of Compress and Gzip, 



Table 1. Four Text Files. 



file 


annotation 


Brown corpus 
(6.4 Mbyte) 


A well-known collection of English sentences, which was com- 
piled in the early 1960s at Brown University, USA. 


Medline 
(60.3 Mbyte) 


A clinically-oriented subset of Medline, consisting of 348,566 ref- 
erences. 


Genbankl 
(43.3 Mbyte) 


A subset of the GenBank database, an annotated collection of 
all publicly available DNA sequences. 


Genbank2 
(17.1 Mbyte) 


The file obtained by removing all fields other than accession 
number and nucleotide sequence from the above one. 
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especially for English texts. We also estimated the CPU times for compression 
and decompression. Although we omit here the results because of lack of space, 
we observed that the BPE compression was originally very slow, and it is dras- 
tically accelerated by the modification stated in Section 2.2. In fact, the original 
BPE compression is 4 ^ 5 times slower than Gzip, whereas the modified one 
is 4 5 times faster than Gzip and is competitive with Compress with 12 bit 

option. 

Thus, BPE is not so good from the traditional criteria. This is the reason 
why it has received little attentions, until now. However, it has the following 
properties which are quite attractive from the practical viewpoint of compressed 
pattern matching: (1) No bit-wise operations are required since all the codes 
are of 8 bits; (2) Decompression requires very small amount of memory; and 
(3) Partial decompression is possible, that is, we can decompress any portion of 
compressed text. 

In the next section, we will show how we can perform compressed pattern 
matching efficiently in the case of BPE compression. 



Table 2. Compression Ratios (%). 







BPE 


Gompress 


Gzip 






standard modified 


12bit 16bit 




Brown corpus 


(6.8Mb) 


51.08 


59.02 


51.67 43.75 


39.04 


Medline 


(60.3Mb) 


56.20 


59.07 


54.32 42.34 


33.35 


Genbankl 


(43.3Mb) 


46.79 


51.36 


43.73 32.55 


24.84 


Genbank2 


(17.1Mb) 


30.80 


32.50 


29.63 26.80 


23.15 



3 Pattern Matching in BPE Compressed Texts 

For searching a compressed text, the most naive approach would be the one 
which applies any string matching routine with expanding the original text on 
the fly. Another approach is to encode a given pattern and apply any string 
matching routine in order to find the encoded pattern directly in the compressed 
text. The problem in this approach is that the encoded pattern is not unique. 
A solution due to Manber [20] was to devise a way to restrict the number of 
possible encodings for any string. 

The approach we take here is basically an instance of the general compressed 
pattern matching algorithm for strings described in terms of collage system [16] . 
As stated in Introduction, collage system is a unifying framework that abstracts 
most of existing dictionary-based compression methods. In the framework, a 
string is described by a pair of a dictionary T> and a sequence S of tokens 
representing phrases in T>. A dictionary 2? is a sequence of assignments where 
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the basic operations are concatenation, repetition, and prefix (suffix) truncation. 
A text compressed by BPE is described by a collage system with no truncation 
operations. For a collage system with no truncation, the general compressed 
pattern matching algorithm runs in 0(||r>|| + |5[+m^+r) time using 0(||2?||+m^) 
space, where ||2?|| denotes the size of the dictionary T> and |5| is the length of 
the sequence S. 

The basic idea of the general algorithm is to simulate the move of the KMP 
automaton for input T> and S. Note that one token of sequence S may represent 
a string of length more than one, which causes a series of state transitions. The 
idea is to substitute just one state transition for each such consecutive state 
transitions. More formally, let h : Q x if ^ Q be the state transition function of 
the KMP automaton, where U is the alphabet and Q is the set of states. Extend 
6 into the function S : Q x U* Q hy 

6{q,s) = q and 6{q,ua) = 6{6{q,u),a), 

where q e Q, u E U*, and a E U. Let D be the set of phrases in dictionary. Let 
Jump be the limitation of <5 to the domain Q x D. 

By identifying a token with the phrase it represents, we can define the new 
automaton which takes as input a sequence of tokens and makes state transition 
by using Jump. The state transition of the new machine caused by a token corre- 
sponds to the consecutive state transitions of the KMP automaton caused by the 
phrase represented by the token. Thus, we can simulate the state transitions of 
the KMP automaton by using the new machine. However, the KMP automaton 
may pass through the final state during the consecutive transitions. Hence the 
new machine should be a Mealy type sequential machine with output function 
Output : Q X D ^ 2^ defined by 

Output{q,u) = {i E A'll < f < |u| and <5(g, u[l..f]) is the final state}, 

where N denotes the set of natural numbers, and denotes the length i 

prefix of string u. 

In [16] efficient realizations of the functions Jump and Output were discussed 
for general case. In the case of BPE compression, a simpler implementation is 
possible. We take two implementations. One is to realize the state transition 
function Jump defined on Q x H as a two-dimensional array of size |Q| x 1^1- 
The array size is not critical since the number of phrases in D is at most 256 
in BPE compression. This is not the case with LZW, in which \D\ can be the 
compressed text size. 

Another implementation is the one utilizing the bit parallel paradigm in a 
similar way that we did for LZW compression [17]. Technical details are omitted 
because of lack of space. 

4 Experimental Results 

We estimated the running time of the proposed algorithms running on BPE 
compressed files. We tested the two implementations mentioned in the previ- 
ous section. For comparisons, we tested the algorithm [17] in searching LZW 
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compressed files. We also tested the KMP algorithm, the Shift-Or algorithm 
[26,3], and Agrep (the Boyer-Moore-Horpspool algorithm) in searching uncom- 
pressed files. The performance of the BM type algorithm strongly depends upon 
the pattern length m, and therefore the running time of Agrep was tested for 
m = 4,8, 16. The performance of each algorithm other than Agrep is indepen- 
dent of the pattern length. The text files we used are the same as the four text 
files mentioned in Section 2. The machine used is a PC with a Pentium III pro- 
cessor at 500MHz running TurboLinux 4.0 operating system. The data transfer 
speed was about 7.7 Mbyte/sec. 

The results are shown in Table 3, where we included the preprocessing time. 
In this table, (a) and (b) stand for the automata and the bit-parallel implemen- 
tations stated in the previous section, respectively. 



Table 3. Performance Comparisons. 





BPE 


LZW 


1 uncompressed 


(a) (b) 


[17] 


KMP Shift-Or 


1 Agrep 


m = 4 


m = 8 m 


= 16 




Brown Corpus 


0.09 0.16 


0.94 


0.13 


0.11 


0.09 


0.07 


0.07 


CPU time 


Medline 


1.03 1.43 


6.98 


1.48 


1.28 


0.85 


0.69 


0.63 


(sec) 


Genbankl 


0.52 0.89 


4.17 


0.81 


0.76 


0.72 


0.58 


0.53 




Genbank2 


0.13 0.22 


1.33 


0.32 


0.29 


0.27 


0.32 


0.32 




Brown Corpus 


0.59 0.54 


1.17 


0.91 


1.01 


0.91 


0.90 


0.90 


elapsed time 


Medline 


4.98 4.95 


7.53 


8.38 


8.26 


8.01 


7.89 


7.99 


(sec) 


Genbankl 


3.04 2.95 


4.48 


6.26 


6.32 


6.08 


5.67 


5.64 




Genbank2 


0.76 0.73 


1.46 


2.28 


2.33 


2.19 


2.18 


2.14 



First of all, it is observed that, in CPU time comparison, the automata-based 
implementation of the proposed algorithm in searching BPE compressed file is 
faster than each of the routines except Agrep. Comparing with Agrep, it is good 
for Genbankl and Genbank2, but not so for other two files. The reason for this is 
that the performance of the proposed algorithm depends on compression ratio. 
Recall that the compression ratios for Genbankl and Genbank2 are relatively 
high in comparison with those of Brown corpus and Medline. 

From a practical viewpoint, the running speed in elapsed time is also impor- 
tant, although it is not easy to measure accurate values of elapsed time. Table 3 
implies that the proposed algorithm is the fastest in the elapsed time comparison. 

5 Conclusion 



We have shown potential advantages of BPE compression from a viewpoint of 
compressed pattern matching. 
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The number of tokens in BPE is limited to 256 so that all the tokens are en- 
coded in 8 bits. The compression ratio can be improved if we raise the limitation 
to the number of tokens. A further improvement is possible by using variable- 
length codewords. However, it is preferable to use fixed-length codewords with 8 
bits from the viewpoint of compressed pattern matching since we want to keep 
the search on a b 3 de level for efficiency. 

One future direction of this study will be to develop approximate pattern 
matching algorithms for BPE compressed text. 
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